[
https://issues.apache.org/jira/browse/HBASE-21734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zheng Hu updated HBASE-21734:
-----------------------------
Hadoop Flags: Reviewed
Release Note:
After HBASE-21620, the filterListWithOR has been a bit slow because we need to
merge each sub-filter's RC , while before HBASE-21620, we will skip many RC
merging, but the logic was wrong. So here we choose another way to optimaze the
performance: removing the KeyValueUtil#toNewKeyCell.
Anoop Sam John suggested that the KeyValueUtil#toNewKeyCell can save some GC
before because if we copy key part of cell into a single byte[], then the block
the cell refering won't be refered by the filter list any more, the upper layer
can GC the data block quickly. while after HBASE-21620, we will update the
prevCellList for every encountered cell now, so the lifecycle of cell in
prevCellList for FilterList will be quite shorter. so just use the cell ref for
saving cpu.
BTW, we removed all the arrays streams usage in filter list, because it's also
quite time-consuming in our test.
> Some optimization in FilterListWithOR
> -------------------------------------
>
> Key: HBASE-21734
> URL: https://issues.apache.org/jira/browse/HBASE-21734
> Project: HBase
> Issue Type: Sub-task
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5
>
> Attachments: HBASE-21734.branch-1.v1.patch, HBASE-21734.v1.patch,
> columnkey.txt, perf-ut.patch
>
>
> In HBASE-21620, [~KarthickRam] and [~mohamed.meeran] complaind that their
> performance of filter list has been degraded after that patch in here [1].
> I wrote a UT for this, and test under my host. It's true. I gussed there
> may be two reasons:
> 1. the comparator.compare(nextKV, cell) > 0 StoreScanner;
> 2. the filter list concated by OR will choose the minimal forward step among
> all sub-filters. in this patch, we have stricter restrictions on all sub
> filters, include those sub-filter whose has non-null RC returned in
> calculateReturnCodeByPrevCellAndRC (previously, we will skip to merge this
> sub-filter's rc, but it's wrong in some case), and merge all of the
> sub-filter's RC, this is also some time cost.
> The former one seems not the main problem, because the UT still cost ~ 3s
> even if I comment the compare. the second one has some impact indeed,
> because after i skip to merge the sub-filters's RC if
> calculateReturnCodeByPrevCellAndRC return a non-null rc, the UT cost ~ 1s,
> it's improvement but the logic is not wrong.
> 1.
> https://issues.apache.org/jira/browse/HBASE-21620?focusedCommentId=16737100&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16737100
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)