[
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172541#comment-17172541
]
ramkrishna.s.vasudevan commented on HBASE-24754:
------------------------------------------------
I was able to verify in my local linux VM and the significant drop is due to
the Comparator.
The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much
from 15 to 22 secs.
See the stack trace and that explains the reason
Branch-1.3
{code}
main" #1 prio=5 os_prio=0 tid=0x00007f5ffc010800 nid=0x4b0b runnable
[0x00007f6003887000]
java.lang.Thread.State: RUNNABLE
at
org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104)
at
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157)
{code}
Where the code there is
{code}
return Bytes.compareTo(left, loffset + lfamilylength,
llength - lfamilylength,
right, roffset + rfamilylength, rlength - rfamilylength);
{code}
Where as in the branch-2 code base
{code}
"main" #1 prio=5 os_prio=0 tid=0x00007f4a48016000 nid=0x488a runnable
[0x00007f4a507bb000]
java.lang.Thread.State: RUNNABLE
at
org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1111)
at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337)
at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353)
at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368)
at
org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406)
at
org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169)
at
org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105)
at
org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266)
at
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:86)
at
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:67)
at
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:45)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:191)
at
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:242)
{code}
So we do more work to do the comparison when we have large rows. I think the
similar thing is happening out in the other issue where we try to filter out
large number of rows during a scan. (just saying but that i have not spent time
on that ).
> Bulk load performance is degraded in HBase 2
> ---------------------------------------------
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
> Issue Type: Bug
> Components: Performance
> Affects Versions: 2.2.3
> Reporter: Ajeet Rai
> Priority: Major
> Attachments: Branch1.3_putSortReducer_sampleCode.patch,
> Branch2_putSortReducer_sampleCode.patch
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
> Test Input:
> 1: Table with 500 region(300 column family)
> 2: data =2 TB
> Data Sample
> 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111
> 3: Cluster: 7 node(2 master+5 Region Server)
> 4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both
> cluster
>
> |Feature|HBase 2.2.3
> Time(Sec)|HBase 1.3.1
> Time(Sec)|Diff%|Snappy lib:
> |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
> HBase 2.2.3: 1.4
> HBase 1.3.1: 1.4|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)