[
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250562#comment-17250562
]
Michael Stack commented on HBASE-24754:
---------------------------------------
Chatting w/ a coworker, he talked of being able to make a call high-up on what
types of Cells/KVs are involved and before we start the task, make a call on
the CellComparator to use (even suggested auto-generating the optimal... ).
Seems like you can do this when bulk loading.Can look at the file and figure
what the Cell type.... And then choose a CellComparator to use... one w/ no
branching shaped to fit the Cells it will see. Are we set up to allow
inserting a particular CellComparator to use in MR tasks? Good stuff.
> Bulk load performance is degraded in HBase 2
> ---------------------------------------------
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
> Issue Type: Bug
> Components: Performance
> Affects Versions: 2.2.3
> Reporter: Ajeet Rai
> Assignee: ramkrishna.s.vasudevan
> Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: Branc2_withComparator_atKeyValue.patch,
> Branch1.3_putSortReducer_sampleCode.patch,
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg,
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
> Test Input:
> 1: Table with 500 region(300 column family)
> 2: data =2 TB
> Data Sample
> 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111
> 3: Cluster: 7 node(2 master+5 Region Server)
> 4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both
> cluster
>
> |Feature|HBase 2.2.3
> Time(Sec)|HBase 1.3.1
> Time(Sec)|Diff%|Snappy lib:
> |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
> HBase 2.2.3: 1.4
> HBase 1.3.1: 1.4|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)