[jira] [Comment Edited] (HBASE-22387) Evaluate the get/scan performance after reading HFile block into offheap directly

Zheng Hu (JIRA) Thu, 30 May 2019 06:49:14 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851705#comment-16851705
 ]


Zheng Hu edited comment on HBASE-22387 at 5/30/19 1:48 PM:
-----------------------------------------------------------

Some issues we've found in the flamegraph: 
1.  About 45% heap allocation are wasted on DFS client ,  Particularly the 
BlockReaderFactory#requestFileDescriptors  consumed about 25% heap allocation. 
After reading the code, we found that we allocate 8KB buffers for 
BufferedOutputStream each time to read the requestShortCircuitFds response, 
while the reponse should be quite small,  it's too wasting to allocate such a 
big buffer.  I will give a benchmark with a small buffer size, such as 1KB.  
Another one is the protobuf's CodedInputStream, it also use a 4KB buffers to 
parse the recieved response, that's also too wasting. 
For the 8KB BufferedOutputStream, uploaded a simple patch for it: 
[0001-Initialize-a-small-buffer-to-send-the-short-circuit-.patch|https://issues.apache.org/jira/secure/attachment/12970333/0001-Initialize-a-small-buffer-to-send-the-short-circuit-.patch]

2.  There should be a bug when BlockLocalReader#close, because we can clearly 
see that we cost about 2% cpu to fill an RuntimeException: 
 !screenshot-2.png! 


was (Author: openinx):
Some issues we've found in the flamegraph: 
1.  About 45% heap allocation are wasted on DFS client ,  Particularly the 
BlockReaderFactory#requestFileDescriptors  consumed about 25% heap allocation. 
After reading the code, we found that we allocate 8KB buffers for 
BufferedOutputStream each time to read the requestShortCircuitFds response, 
while the reponse should be quite small,  it's too wasting to allocate such a 
big buffer.  I will give a benchmark with a small buffer size, such as 1KB.  
Another one is the protobuf's CodedInputStream, it also use a 4KB buffers to 
parse the recieved response, that's also too wasting.
2.  There should be a bug when BlockLocalReader#close, because we can clearly 
see that we cost about 2% cpu to fill an RuntimeException: 
 !screenshot-2.png! 

> Evaluate the get/scan performance after reading HFile block into offheap 
> directly
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-22387
>                 URL: https://issues.apache.org/jira/browse/HBASE-22387
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>         Attachments: 
> 0001-Initialize-a-small-buffer-to-send-the-short-circuit-.patch, 
> G1GC-stage-stw.case01.png, G1GC-stage-stw.case02-with-buffer-size-64KB.png, 
> G1GC-stw.case01.png, G1GC-stw.case02-with-buffer-size-64KB.png, 
> QPS-latency.case01.png, QPS-latency.case02-with-buffer-size-64KB.png, 
> async-prof-pid-25042-alloc-2.svg, async-prof-pid-25042-cpu-1.svg, 
> async-prof-pid-25042-lock-3.svg, blockReaderRemote.png, 
> blocksByHFile-stack-trace.png, screenshot-1.png, screenshot-2.png, 
> test-cluster-configuration-details.png
>
>
> Now, all sub-tasks has been resolved now (except the HBASE-21946 because of 
> the hadoop dependency problem), will provide some performance benckmarks to 
> show the latency improvement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-22387) Evaluate the get/scan performance after reading HFile block into offheap directly

Reply via email to