[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

ramkrishna.s.vasudevan (Jira) Thu, 06 Aug 2020 09:54:20 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172541#comment-17172541
 ]


ramkrishna.s.vasudevan commented on HBASE-24754:
------------------------------------------------

I was able to verify in my local linux VM and the significant drop is due to 
the Comparator. 

The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much 
from 15 to 22 secs. 

See the stack trace and that explains the reason 
Branch-1.3
{code}
main" #1 prio=5 os_prio=0 tid=0x00007f5ffc010800 nid=0x4b0b runnable 
[0x00007f6003887000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897)
        at java.util.TreeMap.put(TreeMap.java:552)
        at java.util.TreeSet.add(TreeSet.java:255)
        at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104)
        at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157)
{code}
Where the code there is
{code}
 return Bytes.compareTo(left, loffset + lfamilylength,
        llength - lfamilylength,
        right, roffset + rfamilylength, rlength - rfamilylength);
{code}
Where as in the branch-2 code base
{code}
"main" #1 prio=5 os_prio=0 tid=0x00007f4a48016000 nid=0x488a runnable 
[0x00007f4a507bb000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533)
        at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127)
        at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1111)
        at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337)
        at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353)
        at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368)
        at 
org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406)
        at 
org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169)
        at 
org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105)
        at 
org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266)
        at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:86)
        at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:67)
        at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:45)
        at java.util.TreeMap.put(TreeMap.java:552)
        at java.util.TreeSet.add(TreeSet.java:255)
        at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:191)
        at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:242)
{code}
So we do more work to do the comparison when we have large rows. I think the 
similar thing is happening out in the other issue where we try to filter out 
large number of rows during a scan. (just saying but that i have not spent time 
on that ).

> Bulk load performance is degraded in HBase 2 
> ---------------------------------------------
>
>                 Key: HBASE-24754
>                 URL: https://issues.apache.org/jira/browse/HBASE-24754
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 2.2.3
>            Reporter: Ajeet Rai
>            Priority: Major
>         Attachments: Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

Reply via email to