[jira] [Commented] (HDFS-14541) ShortCircuitReplica#unref cost about 6% cpu and 6% heap allocation because of the frequent thrown NoSuchElementException in our HBase benchmark

Zheng Hu (JIRA) Tue, 18 Jun 2019 03:41:46 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866472#comment-16866472
 ]


Zheng Hu commented on HDFS-14541:
---------------------------------

Our XiaoMi HBase Team have made a performance testing, the only difference is 
whether include this patch or not.  (BTW, we build a clusters with 5 nodes, 
each nodes have 800GB * 12 SSD, the HBase use 50G onheap + 50GB offheap. For 
addressing difference of this JIRA, we've disabled the blockCache so that all 
QPS directed to HDFS client. )

|| Case || Heap FlameGraph || CPU FlameGraph || HBase QPS & Latency ||
| Before Case | 
[before-heap-flame-graph.svg|https://issues.apache.org/jira/secure/attachment/12972068/before-heap-flame-graph.svg]
  | 
[before-cpu-flame-graph.svg|https://issues.apache.org/jira/secure/attachment/12972069/before-cpu-flame-graph.svg]
  | 
[before-QPS.png|https://issues.apache.org/jira/secure/attachment/12972067/before-QPS.png]
   |  
| After Case | 
[after-heap-flame-graph.svg|https://issues.apache.org/jira/secure/attachment/12972071/after-heap-flame-graph.svg]
 | 
[after-cpu-flame-graph.svg|https://issues.apache.org/jira/secure/attachment/12972072/after-cpu-flame-graph.svg]
  |  
[after-QPS.png|https://issues.apache.org/jira/secure/attachment/12972070/after-QPS.png]
  | 

We can clearly see that:  after the patch, the HBase throughput increased about 
17.8% ( math: (33K qps - 28K qps) / 28Kqps  ~ 17.8%) .  Also the 6% cpu & heap 
cost in flame graph disappeared, those are very good points. 
[~elgoiri],  I think [~leosun08]  is preparing the new patch to address your 
comment.  Thanks.

> ShortCircuitReplica#unref cost about 6% cpu and 6% heap allocation because of 
> the frequent thrown NoSuchElementException  in our HBase benchmark
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14541
>                 URL: https://issues.apache.org/jira/browse/HDFS-14541
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Zheng Hu
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14541.000.patch, after-QPS.png, 
> after-cpu-flame-graph.svg, after-heap-flame-graph.svg, 
> async-prof-pid-94152-alloc-2.svg, async-prof-pid-94152-cpu-1.svg, 
> before-QPS.png, before-cpu-flame-graph.svg, before-heap-flame-graph.svg
>
>
> Our XiaoMi HBase team are evaluating the performence improvement of 
> HBASE-21879,  and we have few CPU flame graph  & heap flame graph by using 
> async-profiler,  and find that there're some performence issues in DFSClient  
> . 
> See the attached two flame graphs, we can conclude that the try catch block 
> in ShortCircuitCache#trimEvictionMaps  has some serious perf problem , we 
> should remove the try catch from DFSClient. 
> {code}
>   /**
>    * Trim the eviction lists.
>    */
>   private void trimEvictionMaps() {
>     long now = Time.monotonicNow();
>     demoteOldEvictableMmaped(now);
>     while (true) {
>       long evictableSize = evictable.size();
>       long evictableMmappedSize = evictableMmapped.size();
>       if (evictableSize + evictableMmappedSize <= maxTotalSize) {
>         return;
>       }
>       ShortCircuitReplica replica;
>       try {
>         if (evictableSize == 0) {
>           replica = (ShortCircuitReplica)evictableMmapped.get(evictableMmapped
>               .firstKey());
>         } else {
>           replica = (ShortCircuitReplica)evictable.get(evictable.firstKey());
>         }
>       } catch (NoSuchElementException e) {
>         break;
>       }
>       if (LOG.isTraceEnabled()) {
>         LOG.trace(this + ": trimEvictionMaps is purging " + replica +
>             StringUtils.getStackTrace(Thread.currentThread()));
>       }
>       purge(replica);
>     }
>   }
> {code}
> Our Xiaomi HDFS Team member [~leosun08] will prepare patch for this issue.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14541) ShortCircuitReplica#unref cost about 6% cpu and 6% heap allocation because of the frequent thrown NoSuchElementException in our HBase benchmark

Reply via email to