[ 
https://issues.apache.org/jira/browse/HBASE-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733921#comment-16733921
 ] 

Zheng Hu commented on HBASE-21657:
----------------------------------

{quote}Whats the hdd flamegraph look like? It was same version? Where is it 
spending time? In same place?
{quote}
I've tested both HDD and SSD in the same version, and the throughtput almost 
the same (about ~2000 ops/sec for a single node). But I did not catch the 
FlameGraph for HDD node before. If necessary, I can provide the FlameGraph for 
HDD too.
{quote}Which are these Zheng Hu ? And when you say, did not work, is it that 
they are not inlining?
{quote}
I mean the PrivateCellUtil#estimatedSerializedSizeOf method, its implementation 
is not so lightweight, use the instanceof and class cast frequently. there's a 
post[1] says: _Methods that can be inlined include static, private or final 
methods but also public methods if it can be determined that they are not 
overridden._
 So I guess the hot PrivateCellUtil#estimatedSerializedSizeOf was not inline, 
and tried to move the estimatedSerializedSizeOf from ExtendedCell to Cell for 
comparison. Seems the instanceof & class cast also cost some cpu, but the 
method inline seems still did not work after patch.v2 (because of the kinds of 
method overridden ? , I'm not sure... ) we can see that: after applying the 
patch.v2, there's still large percent of cpu wasted on 
PrivateCellUtil#estimatedSerializedSizeOf.

Another optimization is here, we can set a determinated array length for 
*avoiding the frequent list extension* when in a big scan such as the YCSB case 
, scan with a limit which is a random length between 1 to 1000.
{code:java}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
index f788a86..6a8b596 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
@@ -3484,7 +3484,7 @@ public class RSRpcServices implements 
HBaseRPCErrorHandler,
     MutableObject<Object> lastBlock = new MutableObject<>();
     boolean scannerClosed = false;
     try {
-      List<Result> results = new ArrayList<>();
+      List<Result> results = new ArrayList<>(rows);
{code}
 
{quote}Adding serialized size to Cell Interface is a radical change but having 
defaults makes it easier and hard to argue w/ a 40% speedup
{quote}
Currently, the Cell interface seems try to expose few concept to the upstream 
user, For example, Tag won't be used by upstream user, but is used by HBase 
server side (ACL & Mob feature depends on tags). we removed the tag bytes 
before sending the server side cell bytes to user when encoding. So i'm not 
sure whether moving the getSerilizedCell (tags or not ) method into Cell 
interface is corrent or not even if we gain 40% speedup. How do you think ? 
[~anoop.hbase].
{quote}Could you get a flame graph from your current prod to see what its 
flamegraph looks like – where branch-1 is spending its time Scanning?
{quote}
Our prod branch is HBase0.98 now , and we backport many features and made much 
customization. seems it's not a good contrast. But maybe I can give a 
FlameGraph based on the community branch-1.

Thanks.

1. [https://techblug.wordpress.com/2013/08/19/java-jit-compiler-inlining/]

> PrivateCellUtil#estimatedSerializedSizeOf has been the bottleneck in 100% 
> scan case.
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-21657
>                 URL: https://issues.apache.org/jira/browse/HBASE-21657
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5
>
>         Attachments: HBASE-21657.v1.patch, HBASE-21657.v2.patch, 
> HBase2.0.4-with-patch.v2.png, HBase2.0.4-without-patch-v2.png, 
> hbase2.0.4-ssd-scan-traces.2.svg, hbase2.0.4-ssd-scan-traces.svg, 
> hbase20-ssd-100-scan-traces.svg
>
>
> We are evaluating the performance of branch-2, and find that the throughput 
> of scan in SSD cluster is almost the same as HDD cluster. so I made a 
> FlameGraph on RS, and found that the 
> PrivateCellUtil#estimatedSerializedSizeOf cost about 29% cpu, Obviously, it 
> has been the bottleneck in 100% scan case.
> See the [^hbase20-ssd-100-scan-traces.svg]
> BTW, in our XiaoMi branch, we introduce a 
> HRegion#updateReadRequestsByCapacityUnitPerSecond to sum up the size of cells 
> (for metric monitor), so it seems the performance loss was amplified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to