[jira] [Created] (HADOOP-16901) HDFS-client: boost ShortCircuit Cache

Danil Lipovoy (Jira) Tue, 03 Mar 2020 10:00:10 -0800

Danil Lipovoy created HADOOP-16901:
--------------------------------------

             Summary: HDFS-client: boost ShortCircuit Cache
                 Key: HADOOP-16901
                 URL: https://issues.apache.org/jira/browse/HADOOP-16901
             Project: Hadoop Common
          Issue Type: New Feature
         Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.


8 RegionServers (2 by host)

8 tables by 64 regions by 1.88 Gb data in each = 1200 Gb total

Random read in 800 threads via YCSB and a little bit updates (10% of reads)


            Reporter: Danil Lipovoy
         Attachments: hdfs_cpu.png, hdfs_reads.png

I want to propose how to improve reading performance HDFS-client. The idea: 
create few instances SchortCircuit caches instead of one. 

The key points:
1. Create array of caches:
{code:java}
private ClientContext(String name, DfsClientConf conf, Configuration config) {
...
    shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
    for (int i = 0; i < this.clientShortCircuitNum; i++) {
      this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
    }

{code}

2 Then divide blocks by caches:
{code:java}
  public ShortCircuitCache getShortCircuitCache(long idx) {
    return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
  }
{code}

3. And how to call it:
{code:java}
ShortCircuitCache cache = 
clientContext.getShortCircuitCache(block.getBlockId());
{code}

The last number of offset evenly distributed from 0 to 9 - thats why all caches 
will full approximatly the same.

It is good for performance. Below the attachment, where clientShortCircuitNum = 
3. There is load test reading HDFS via HBase. We can see that performance grows 
~30%, CPU usage about 15%. 

Will try to add the link to PullRequest soon.
Hope it is intresting for somebody. 
Ready to explain some unobvious things.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HADOOP-16901) HDFS-client: boost ShortCircuit Cache

Reply via email to