[ 
https://issues.apache.org/jira/browse/HDFS-15202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059638#comment-17059638
 ] 

Danil Lipovoy edited comment on HDFS-15202 at 3/15/20, 12:32 PM:
-----------------------------------------------------------------

Yes, you are right, there are ways improve performance even more. On the other 
hand it would be more difficult. Can we go step by step, something like agile 
approch? Any way, since it is better then before, can we merge it? It is 
important for me) I would test some CRC32 for example later. 


was (Author: pustota):
Yes, you are right, there are ways improve performance even more. On the other 
hand it would be more difficult. Can we go step by step, something like agile 
way? Any way, since it is better then before, can we merge it? It is important 
for me) I would test some CRC32 for example later. 

> HDFS-client: boost ShortCircuit Cache
> -------------------------------------
>
>                 Key: HDFS-15202
>                 URL: https://issues.apache.org/jira/browse/HDFS-15202
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: dfsclient
>         Environment: 4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.
> 8 RegionServers (2 by host)
> 8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total
> Random read in 800 threads via YCSB and a little bit updates (10% of reads)
>            Reporter: Danil Lipovoy
>            Assignee: Danil Lipovoy
>            Priority: Minor
>         Attachments: HDFS_CPU_full_cycle.png, cpu_SSC.png, cpu_SSC2.png, 
> hdfs_cpu.png, hdfs_reads.png, hdfs_scc_3_test.png, 
> hdfs_scc_test_full-cycle.png, locks.png, requests_SSC.png
>
>
> ТотI want to propose how to improve reading performance HDFS-client. The 
> idea: create few instances ShortCircuit caches instead of one. 
> The key points:
> 1. Create array of caches (set by 
> clientShortCircuitNum=*dfs.client.short.circuit.num*, see in the pull 
> requests below):
> {code:java}
> private ClientContext(String name, DfsClientConf conf, Configuration config) {
> ...
>     shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
>     for (int i = 0; i < this.clientShortCircuitNum; i++) {
>       this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
>     }
> {code}
> 2 Then divide blocks by caches:
> {code:java}
>   public ShortCircuitCache getShortCircuitCache(long idx) {
>     return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
>   }
> {code}
> 3. And how to call it:
> {code:java}
> ShortCircuitCache cache = 
> clientContext.getShortCircuitCache(block.getBlockId());
> {code}
> The last number of offset evenly distributed from 0 to 9 - that's why all 
> caches will full approximately the same.
> It is good for performance. Below the attachment, it is load test reading 
> HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that 
> performance grows ~30%, CPU usage about +15%. 
> Hope it is interesting for someone.
> Ready to explain some unobvious things.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to