[
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175168#comment-17175168
]
Anoop Sam John commented on HBASE-24754:
----------------------------------------
We should optimize it at the CellComparatorImpl level itself so that all flows
can take adv. This can be an issue in the overall perf issue which deal with
so many Cells and compares. (The other 2.x perf issue of filtering cells in a
range scan - HBASE-24637 )
In the initial time of CellComparatorImpl , there were some optimizations and
so many overloaded compareXXX methods which takes not just Cells but few
offsets/lengths also.. I think eventually got cleaned up. But such cleanup
affect perf very much is what we seeing now.
In case of KeyValue the biggest adv is that we know it is a single contiguous
datastructure backed object and so have ways to parse offset/length with out
doing back to back decoding of other lengths every time. In a generic Cell and
CellComparator such assumptions are not possible. But normally in HBase most
of the time, the Cells flowing will be KV or BBKV both backed by contiguous
datastructure .. We can think of having a new interface to mark such Cells and
a CellComparator impl to take adv of that. This needs a bigger effort but its
worth.
> Bulk load performance is degraded in HBase 2
> ---------------------------------------------
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
> Issue Type: Bug
> Components: Performance
> Affects Versions: 2.2.3
> Reporter: Ajeet Rai
> Assignee: ramkrishna.s.vasudevan
> Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch,
> Branch1.3_putSortReducer_sampleCode.patch,
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg,
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
> Test Input:
> 1: Table with 500 region(300 column family)
> 2: data =2 TB
> Data Sample
> 18600000001201502051000000068110,18600000001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111111111111111111111111111111111111111111111111111111111111111111111111111111111
> 3: Cluster: 7 node(2 master+5 Region Server)
> 4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both
> cluster
>
> |Feature|HBase 2.2.3
> Time(Sec)|HBase 1.3.1
> Time(Sec)|Diff%|Snappy lib:
> |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
> HBase 2.2.3: 1.4
> HBase 1.3.1: 1.4|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)