[ 
https://issues.apache.org/jira/browse/HADOOP-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306606#comment-17306606
 ] 

Mingliang Liu commented on HADOOP-17222:
----------------------------------------

Thanks [~sodonnell]. 

I guess I do not play with clusters with >1000 nodes nowdays (we have more and 
small clusters) and do not have preference about >1000 cache size. But totally 
understand your use case. Since this is just a max size (or capacity) instead 
of the typical real size, I think it makes sense if you want to increase the 
hard limit from 1000 to something like 3000. The number is more an art so 
anything above 2000 makes sense I assume.

Yes, I think this can be backported in old branches like 3.3. I recall vaguely 
that there is not 3.4 specific logic or assumption. Will defer to [~fanrui] and 
[~hexiaoqiao] to confirm. I can help backport. So, is 3.2/3.1 also doable? 

 

>  Create socket address leveraging URI cache
> -------------------------------------------
>
>                 Key: HADOOP-17222
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17222
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common, hdfs-client
>         Environment: HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g 
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1) 
> @Warmup(iterations = 300) 
> @Measurement(iterations = 300)
>            Reporter: fanrui
>            Assignee: fanrui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>         Attachments: After Optimization remark.png, After optimization.svg, 
> Before Optimization remark.png, Before optimization.svg
>
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Note:Not only the hdfs client can get the current benefit, all callers of 
> NetUtils.createSocketAddr will get the benefit. Just use hdfs client as an 
> example.
>  
> Hdfs client selects best DN for hdfs Block. method call stack:
> DFSInputStream.chooseDataNode -> getBestNodeDNAddrPair -> 
> NetUtils.createSocketAddr
> NetUtils.createSocketAddr creates the corresponding InetSocketAddress based 
> on the host and port. There are some heavier operations in the 
> NetUtils.createSocketAddr method, for example: URI.create(target), so 
> NetUtils.createSocketAddr takes more time to execute.
> The following is my performance report. The report is based on HBase calling 
> hdfs. HBase is a high-frequency access client for hdfs, because HBase read 
> operations often access a small DataBlock (about 64k) instead of the entire 
> HFile. In the case of high frequency access, the NetUtils.createSocketAddr 
> method is time-consuming.
> h3. Test Environment:
>  
> {code:java}
> HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g 
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1) 
> @Warmup(iterations = 300) 
> @Measurement(iterations = 300)
> {code}
> h4. Before Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts 
> for 4.86% of the entire CPU, and the creation of URIs accounts for a larger 
> proportion.
> !Before Optimization remark.png!
> h3. Optimization ideas:
> NetUtils.createSocketAddr creates InetSocketAddress based on host and port. 
> Here we can add Cache to InetSocketAddress. The key of Cache is host and 
> port, and the value is InetSocketAddress.
> h4. After Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts 
> for 0.54% of the entire CPU. Here, ConcurrentHashMap is used as the Cache, 
> and the ConcurrentHashMap.get() method gets data from the Cache. The CPU 
> usage of DFSInputStream.getBestNodeDNAddrPair has been optimized from 4.86% 
> to 0.54%.
> !After Optimization remark.png!
> h3. Original FlameGraph link:
> [Before 
> Optimization|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]
> [After Optimization 
> FlameGraph|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to