[ 
https://issues.apache.org/jira/browse/HBASE-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21657:
-----------------------------
    Hadoop Flags: Incompatible change
    Release Note: 
In HBASE-21657,  I simplified the path of estimatedSerialiedSize() & 
estimatedSerialiedSizeOfCell() by moving the general getSerializedSize()
and heapSize() from ExtendedCell to Cell interface. The patch also included 
some other improvments:

1. For 99%  of case, our cells has no tags, so let the HFileScannerImpl just 
return the NoTagsByteBufferKeyValue if no tags, which means we can save 
   lots of cpu time when sending no tags cell to rpc because can just return 
the length instead of getting the serialize size by caculating offset/length 
   of each fields(row/cf/cq..)
2. Move the subclass's getSerializedSize implementation from ExtendedCell to 
their own class, which mean we did not need to call ExtendedCell's
   getSerialiedSize() firstly, then forward to subclass's 
getSerializedSize(withTags).
3. Give a estimated result arraylist size for avoiding the frequent list 
extension when in a big scan, now we estimate the array size as min(scan.rows, 
512).
   it's also help a lot.

We gain almost ~40% throughput improvement in 100% scan case for branch-2 
(cacheHitRatio~100%)[1], it's a good thing. While it's a incompatible change in 
some case, such as if the upstream user implemented their own Cells, although 
it's rare but can happen, then their compile will be error.

> PrivateCellUtil#estimatedSerializedSizeOf has been the bottleneck in 100% 
> scan case.
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-21657
>                 URL: https://issues.apache.org/jira/browse/HBASE-21657
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5
>
>         Attachments: HBASE-21657.v1.patch, HBASE-21657.v2.patch, 
> HBASE-21657.v3.patch, HBASE-21657.v3.patch, HBASE-21657.v4.patch, 
> HBASE-21657.v5.patch, HBASE-21657.v5.patch, HBASE-21657.v5.patch, 
> HBASE-21657.v6.patch, HBASE-21657.v7.patch, 
> HBase1.4.9-ssd-10000000-rows-flamegraph.svg, 
> HBase1.4.9-ssd-10000000-rows-qps-latency.png, 
> HBase2.0.4-patch-v2-ssd-10000000-rows-qps-and-latency.png, 
> HBase2.0.4-patch-v2-ssd-10000000-rows.svg, 
> HBase2.0.4-patch-v3-ssd-10000000-rows-flamegraph.svg, 
> HBase2.0.4-patch-v3-ssd-10000000-rows-qps-and-latency.png, 
> HBase2.0.4-patch-v4-ssd-10000000-rows-flamegraph.svg, 
> HBase2.0.4-ssd-10000000-rows-flamegraph.svg, 
> HBase2.0.4-ssd-10000000-rows-qps-latency.png, HBase2.0.4-with-patch.v2.png, 
> HBase2.0.4-without-patch-v2.png, debug-the-ByteBufferKeyValue.diff, 
> hbase2.0.4-ssd-scan-traces.2.svg, hbase2.0.4-ssd-scan-traces.svg, 
> hbase20-ssd-100-scan-traces.svg, image-2019-01-07-19-03-37-930.png, 
> image-2019-01-07-19-03-55-577.png, overview-statstics-1.png, run.log
>
>
> We are evaluating the performance of branch-2, and find that the throughput 
> of scan in SSD cluster is almost the same as HDD cluster. so I made a 
> FlameGraph on RS, and found that the 
> PrivateCellUtil#estimatedSerializedSizeOf cost about 29% cpu, Obviously, it 
> has been the bottleneck in 100% scan case.
> See theĀ [^hbase20-ssd-100-scan-traces.svg]
> BTW, in our XiaoMi branch, we introduce a 
> HRegion#updateReadRequestsByCapacityUnitPerSecond to sum up the size of cells 
> (for metric monitor), so it seems the performance loss was amplified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to