[
https://issues.apache.org/jira/browse/HBASE-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734744#comment-16734744
]
stack commented on HBASE-21657:
-------------------------------
bq. If necessary, I can provide the FlameGraph for HDD too.
Nah. Would have been good to confirm that it was same profile but we can guess
they were probably the same and CPU the bottleneck.
On inlining, you can enable flag on jvm to see jit in action making ruling on
whether methods are inlined or not. 6.1 here,
https://docs.google.com/document/d/1vZ_k6_pNR1eQxID5u1xFihuPC7FkPaJQW8c4M5eA2AQ/edit#heading=h.izcvb7v6f7pf,
has some notes and late in the doc., it describes how to get jit-watch working
too which does nice graphics on whats inline vs what is not and if you enable
decompile, you can see actual code generated.... which can help debugging why
the time is spent.... FYI.
If the layer above does not completely overlap the level below, I read it as
dispatching costs, jit trying to figure what to call when it not
straight-forward; i.e. private or statics as you write above; e.g. the step
from StoreScanner#next up to the various instances of
PrivateCellUtil#estimatedSerializedSizeOf in the first svgs
Your patch made for a 40% diff in throughput? It looks like it made about 5%
diff in how much cpu the scan takes in overall CPU usage. The simplified
dispatch could be the diff.
It looks like estimatedSerializedSizeOf is just expensive. There is no reuse of
calculated lengths as BBKVComparator#compare does... That could get some cycles
back. Caching the serialized size or trying to figure if we have to do size so
often could help.
bq. avoiding the frequent list extension
I see this in the first svg but not in the second. Perhaps you had set initial
size to 1000 in second run. We should do this yeah but don't we have the number
somewhere of how many rows to return per Scan call so we don't have to guess
1000?
bq. Currently, the Cell interface seems try to expose few concept to the
upstream user
Yes. Was trying to keep it minimal.
bq. So i'm not sure whether moving the getSerilizedCell (tags or not ) method
into Cell interface is corrent or not even if we gain 40% speedup
Getting serialized size is generic enough that it could go up into Cell (good
to ask Anoop too). Exposing a version that takes tag shouldn't go into Cell
(All current tag API in Cell is deprecated).
Regards patched 0.98, we have enough to work with given your current flame
graph from 2.0.x. I was just asking. Can help seeing where hbase1 was spending
its cpu compared to hbase2.
> PrivateCellUtil#estimatedSerializedSizeOf has been the bottleneck in 100%
> scan case.
> ------------------------------------------------------------------------------------
>
> Key: HBASE-21657
> URL: https://issues.apache.org/jira/browse/HBASE-21657
> Project: HBase
> Issue Type: Bug
> Components: Performance
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5
>
> Attachments: HBASE-21657.v1.patch, HBASE-21657.v2.patch,
> HBase2.0.4-with-patch.v2.png, HBase2.0.4-without-patch-v2.png,
> hbase2.0.4-ssd-scan-traces.2.svg, hbase2.0.4-ssd-scan-traces.svg,
> hbase20-ssd-100-scan-traces.svg
>
>
> We are evaluating the performance of branch-2, and find that the throughput
> of scan in SSD cluster is almost the same as HDD cluster. so I made a
> FlameGraph on RS, and found that the
> PrivateCellUtil#estimatedSerializedSizeOf cost about 29% cpu, Obviously, it
> has been the bottleneck in 100% scan case.
> See theĀ [^hbase20-ssd-100-scan-traces.svg]
> BTW, in our XiaoMi branch, we introduce a
> HRegion#updateReadRequestsByCapacityUnitPerSecond to sum up the size of cells
> (for metric monitor), so it seems the performance loss was amplified.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)