[
https://issues.apache.org/jira/browse/HADOOP-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306606#comment-17306606
]
Mingliang Liu commented on HADOOP-17222:
----------------------------------------
Thanks [~sodonnell].
I guess I do not play with clusters with >1000 nodes nowdays (we have more and
small clusters) and do not have preference about >1000 cache size. But totally
understand your use case. Since this is just a max size (or capacity) instead
of the typical real size, I think it makes sense if you want to increase the
hard limit from 1000 to something like 3000. The number is more an art so
anything above 2000 makes sense I assume.
Yes, I think this can be backported in old branches like 3.3. I recall vaguely
that there is not 3.4 specific logic or assumption. Will defer to [~fanrui] and
[~hexiaoqiao] to confirm. I can help backport. So, is 3.2/3.1 also doable?
> Create socket address leveraging URI cache
> -------------------------------------------
>
> Key: HADOOP-17222
> URL: https://issues.apache.org/jira/browse/HADOOP-17222
> Project: Hadoop Common
> Issue Type: Improvement
> Components: common, hdfs-client
> Environment: HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1)
> @Warmup(iterations = 300)
> @Measurement(iterations = 300)
> Reporter: fanrui
> Assignee: fanrui
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: After Optimization remark.png, After optimization.svg,
> Before Optimization remark.png, Before optimization.svg
>
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> Note:Not only the hdfs client can get the current benefit, all callers of
> NetUtils.createSocketAddr will get the benefit. Just use hdfs client as an
> example.
>
> Hdfs client selects best DN for hdfs Block. method call stack:
> DFSInputStream.chooseDataNode -> getBestNodeDNAddrPair ->
> NetUtils.createSocketAddr
> NetUtils.createSocketAddr creates the corresponding InetSocketAddress based
> on the host and port. There are some heavier operations in the
> NetUtils.createSocketAddr method, for example: URI.create(target), so
> NetUtils.createSocketAddr takes more time to execute.
> The following is my performance report. The report is based on HBase calling
> hdfs. HBase is a high-frequency access client for hdfs, because HBase read
> operations often access a small DataBlock (about 64k) instead of the entire
> HFile. In the case of high frequency access, the NetUtils.createSocketAddr
> method is time-consuming.
> h3. Test Environment:
>
> {code:java}
> HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1)
> @Warmup(iterations = 300)
> @Measurement(iterations = 300)
> {code}
> h4. Before Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts
> for 4.86% of the entire CPU, and the creation of URIs accounts for a larger
> proportion.
> !Before Optimization remark.png!
> h3. Optimization ideas:
> NetUtils.createSocketAddr creates InetSocketAddress based on host and port.
> Here we can add Cache to InetSocketAddress. The key of Cache is host and
> port, and the value is InetSocketAddress.
> h4. After Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts
> for 0.54% of the entire CPU. Here, ConcurrentHashMap is used as the Cache,
> and the ConcurrentHashMap.get() method gets data from the Cache. The CPU
> usage of DFSInputStream.getBestNodeDNAddrPair has been optimized from 4.86%
> to 0.54%.
> !After Optimization remark.png!
> h3. Original FlameGraph link:
> [Before
> Optimization|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]
> [After Optimization
> FlameGraph|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]