[
https://issues.apache.org/jira/browse/HDFS-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866472#comment-16866472
]
Zheng Hu commented on HDFS-14541:
---------------------------------
Our XiaoMi HBase Team have made a performance testing, the only difference is
whether include this patch or not. (BTW, we build a clusters with 5 nodes,
each nodes have 800GB * 12 SSD, the HBase use 50G onheap + 50GB offheap. For
addressing difference of this JIRA, we've disabled the blockCache so that all
QPS directed to HDFS client. )
|| Case || Heap FlameGraph || CPU FlameGraph || HBase QPS & Latency ||
| Before Case |
[before-heap-flame-graph.svg|https://issues.apache.org/jira/secure/attachment/12972068/before-heap-flame-graph.svg]
|
[before-cpu-flame-graph.svg|https://issues.apache.org/jira/secure/attachment/12972069/before-cpu-flame-graph.svg]
|
[before-QPS.png|https://issues.apache.org/jira/secure/attachment/12972067/before-QPS.png]
|
| After Case |
[after-heap-flame-graph.svg|https://issues.apache.org/jira/secure/attachment/12972071/after-heap-flame-graph.svg]
|
[after-cpu-flame-graph.svg|https://issues.apache.org/jira/secure/attachment/12972072/after-cpu-flame-graph.svg]
|
[after-QPS.png|https://issues.apache.org/jira/secure/attachment/12972070/after-QPS.png]
|
We can clearly see that: after the patch, the HBase throughput increased about
17.8% ( math: (33K qps - 28K qps) / 28Kqps ~ 17.8%) . Also the 6% cpu & heap
cost in flame graph disappeared, those are very good points.
[~elgoiri], I think [~leosun08] is preparing the new patch to address your
comment. Thanks.
> ShortCircuitReplica#unref cost about 6% cpu and 6% heap allocation because of
> the frequent thrown NoSuchElementException in our HBase benchmark
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14541
> URL: https://issues.apache.org/jira/browse/HDFS-14541
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Zheng Hu
> Assignee: Lisheng Sun
> Priority: Major
> Attachments: HDFS-14541.000.patch, after-QPS.png,
> after-cpu-flame-graph.svg, after-heap-flame-graph.svg,
> async-prof-pid-94152-alloc-2.svg, async-prof-pid-94152-cpu-1.svg,
> before-QPS.png, before-cpu-flame-graph.svg, before-heap-flame-graph.svg
>
>
> Our XiaoMi HBase team are evaluating the performence improvement of
> HBASE-21879, and we have few CPU flame graph & heap flame graph by using
> async-profiler, and find that there're some performence issues in DFSClient
> .
> See the attached two flame graphs, we can conclude that the try catch block
> in ShortCircuitCache#trimEvictionMaps has some serious perf problem , we
> should remove the try catch from DFSClient.
> {code}
> /**
> * Trim the eviction lists.
> */
> private void trimEvictionMaps() {
> long now = Time.monotonicNow();
> demoteOldEvictableMmaped(now);
> while (true) {
> long evictableSize = evictable.size();
> long evictableMmappedSize = evictableMmapped.size();
> if (evictableSize + evictableMmappedSize <= maxTotalSize) {
> return;
> }
> ShortCircuitReplica replica;
> try {
> if (evictableSize == 0) {
> replica = (ShortCircuitReplica)evictableMmapped.get(evictableMmapped
> .firstKey());
> } else {
> replica = (ShortCircuitReplica)evictable.get(evictable.firstKey());
> }
> } catch (NoSuchElementException e) {
> break;
> }
> if (LOG.isTraceEnabled()) {
> LOG.trace(this + ": trimEvictionMaps is purging " + replica +
> StringUtils.getStackTrace(Thread.currentThread()));
> }
> purge(replica);
> }
> }
> {code}
> Our Xiaomi HDFS Team member [~leosun08] will prepare patch for this issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]