[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175168#comment-17175168
 ] 

Anoop Sam John commented on HBASE-24754:
----------------------------------------

We should optimize it at the CellComparatorImpl level itself so that all flows 
can take adv.   This can be an issue in the overall perf issue which deal with 
so many Cells and compares. (The other 2.x perf issue of filtering cells in a 
range scan - HBASE-24637 )
In the initial time of CellComparatorImpl , there were some optimizations and 
so many overloaded compareXXX methods which takes not just Cells but few 
offsets/lengths also.. I think eventually got cleaned up. But such cleanup 
affect perf very much is what we seeing now.
In case of KeyValue the biggest adv is that we know it is a single contiguous 
datastructure backed object and so have ways to parse offset/length with out 
doing back to back decoding of other lengths every time.  In a generic Cell and 
CellComparator such assumptions are not possible.   But normally in HBase most 
of the time, the Cells flowing will be KV or BBKV both backed by  contiguous 
datastructure ..  We can think of having a new interface to mark such Cells and 
a CellComparator impl to take adv of that.  This needs a bigger effort but its 
worth.

> Bulk load performance is degraded in HBase 2 
> ---------------------------------------------
>
>                 Key: HBASE-24754
>                 URL: https://issues.apache.org/jira/browse/HBASE-24754
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 2.2.3
>            Reporter: Ajeet Rai
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Major
>         Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to