[ https://issues.apache.org/jira/browse/HADOOP-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephen O'Donnell resolved HADOOP-17222. ---------------------------------------- Resolution: Fixed > Create socket address leveraging URI cache > ------------------------------------------- > > Key: HADOOP-17222 > URL: https://issues.apache.org/jira/browse/HADOOP-17222 > Project: Hadoop Common > Issue Type: Improvement > Components: common, hdfs-client > Environment: HBase version: 2.1.0 > JVM: -Xmx2g -Xms2g > hadoop hdfs version: 2.7.4 > disk:SSD > OS:CentOS Linux release 7.4.1708 (Core) > JMH Benchmark: @Fork(value = 1) > @Warmup(iterations = 300) > @Measurement(iterations = 300) > Reporter: fanrui > Assignee: fanrui > Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: After Optimization remark.png, After optimization.svg, > Before Optimization remark.png, Before optimization.svg > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Note:Not only the hdfs client can get the current benefit, all callers of > NetUtils.createSocketAddr will get the benefit. Just use hdfs client as an > example. > > Hdfs client selects best DN for hdfs Block. method call stack: > DFSInputStream.chooseDataNode -> getBestNodeDNAddrPair -> > NetUtils.createSocketAddr > NetUtils.createSocketAddr creates the corresponding InetSocketAddress based > on the host and port. There are some heavier operations in the > NetUtils.createSocketAddr method, for example: URI.create(target), so > NetUtils.createSocketAddr takes more time to execute. > The following is my performance report. The report is based on HBase calling > hdfs. HBase is a high-frequency access client for hdfs, because HBase read > operations often access a small DataBlock (about 64k) instead of the entire > HFile. In the case of high frequency access, the NetUtils.createSocketAddr > method is time-consuming. > h3. Test Environment: > > {code:java} > HBase version: 2.1.0 > JVM: -Xmx2g -Xms2g > hadoop hdfs version: 2.7.4 > disk:SSD > OS:CentOS Linux release 7.4.1708 (Core) > JMH Benchmark: @Fork(value = 1) > @Warmup(iterations = 300) > @Measurement(iterations = 300) > {code} > h4. Before Optimization FlameGraph: > In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts > for 4.86% of the entire CPU, and the creation of URIs accounts for a larger > proportion. > !Before Optimization remark.png! > h3. Optimization ideas: > NetUtils.createSocketAddr creates InetSocketAddress based on host and port. > Here we can add Cache to InetSocketAddress. The key of Cache is host and > port, and the value is InetSocketAddress. > h4. After Optimization FlameGraph: > In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts > for 0.54% of the entire CPU. Here, ConcurrentHashMap is used as the Cache, > and the ConcurrentHashMap.get() method gets data from the Cache. The CPU > usage of DFSInputStream.getBestNodeDNAddrPair has been optimized from 4.86% > to 0.54%. > !After Optimization remark.png! > h3. Original FlameGraph link: > [Before > Optimization|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing] > [After Optimization > FlameGraph|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org