[jira] [Commented] (HBASE-21657) PrivateCellUtil#estimatedSerializedSizeOf has been the bottleneck in 100% scan case.

stack (JIRA) Fri, 04 Jan 2019 18:40:36 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734744#comment-16734744
 ]


stack commented on HBASE-21657:
-------------------------------

bq. If necessary, I can provide the FlameGraph for HDD too.

Nah. Would have been good to confirm that it was same profile but we can guess 
they were probably the same and CPU the bottleneck.

On inlining, you can enable flag on jvm to see jit in action making ruling on 
whether methods are inlined or not. 6.1 here, 
https://docs.google.com/document/d/1vZ_k6_pNR1eQxID5u1xFihuPC7FkPaJQW8c4M5eA2AQ/edit#heading=h.izcvb7v6f7pf,
 has some notes and late in the doc., it describes how to get jit-watch working 
too which does nice graphics on whats inline vs what is not and if you enable 
decompile, you can see actual code generated.... which can help debugging why 
the time is spent.... FYI.

If the layer above does not completely overlap the level below, I read it as 
dispatching costs, jit trying to figure what to call when it not 
straight-forward; i.e. private or statics as you write above; e.g. the step 
from StoreScanner#next up to the various instances of 
PrivateCellUtil#estimatedSerializedSizeOf in the first svgs

Your patch made for a 40% diff in throughput? It looks like it made about 5% 
diff in how much cpu the scan takes in overall CPU usage. The simplified 
dispatch could be the diff. 

It looks like estimatedSerializedSizeOf is just expensive. There is no reuse of 
calculated lengths as BBKVComparator#compare does... That could get some cycles 
back. Caching the serialized size or trying to figure if we have to do size so 
often could help.


bq. avoiding the frequent list extension

I see this in the first svg but not in the second. Perhaps you had set initial 
size to 1000 in second run. We should do this yeah but don't we have the number 
somewhere of how many rows to return per Scan call so we don't have to guess 
1000?

bq. Currently, the Cell interface seems try to expose few concept to the 
upstream user

Yes. Was trying to keep it minimal.

bq. So i'm not sure whether moving the getSerilizedCell (tags or not ) method 
into Cell interface is corrent or not even if we gain 40% speedup

Getting serialized size is generic enough that it could go up into Cell (good 
to ask Anoop too). Exposing a version that takes tag shouldn't go into Cell 
(All current tag API in Cell is deprecated).

Regards patched 0.98, we have enough to work with given your current flame 
graph from 2.0.x. I was just asking. Can help seeing where hbase1 was spending 
its cpu compared to hbase2.





> PrivateCellUtil#estimatedSerializedSizeOf has been the bottleneck in 100% 
> scan case.
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-21657
>                 URL: https://issues.apache.org/jira/browse/HBASE-21657
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5
>
>         Attachments: HBASE-21657.v1.patch, HBASE-21657.v2.patch, 
> HBase2.0.4-with-patch.v2.png, HBase2.0.4-without-patch-v2.png, 
> hbase2.0.4-ssd-scan-traces.2.svg, hbase2.0.4-ssd-scan-traces.svg, 
> hbase20-ssd-100-scan-traces.svg
>
>
> We are evaluating the performance of branch-2, and find that the throughput 
> of scan in SSD cluster is almost the same as HDD cluster. so I made a 
> FlameGraph on RS, and found that the 
> PrivateCellUtil#estimatedSerializedSizeOf cost about 29% cpu, Obviously, it 
> has been the bottleneck in 100% scan case.
> See the [^hbase20-ssd-100-scan-traces.svg]
> BTW, in our XiaoMi branch, we introduce a 
> HRegion#updateReadRequestsByCapacityUnitPerSecond to sum up the size of cells 
> (for metric monitor), so it seems the performance loss was amplified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21657) PrivateCellUtil#estimatedSerializedSizeOf has been the bottleneck in 100% scan case.

Reply via email to