[jira] [Updated] (HBASE-21734) Some optimization in FilterListWithOR

Zheng Hu (JIRA) Mon, 21 Jan 2019 20:12:29 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-21734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zheng Hu updated HBASE-21734:
-----------------------------
    Hadoop Flags: Reviewed
    Release Note: 
After HBASE-21620, the filterListWithOR has been a bit slow because we need to 
merge each sub-filter's RC , while before HBASE-21620, we will skip many RC 
merging, but the logic was wrong. So here we choose another way to optimaze the 
performance: removing the KeyValueUtil#toNewKeyCell. 
Anoop Sam John suggested that the KeyValueUtil#toNewKeyCell can save some GC 
before because if we copy key part of cell into a single byte[], then the block 
the cell refering won't be refered by the filter list any more, the upper layer 
can GC the data block quickly. while after HBASE-21620, we will update the 
prevCellList for every encountered cell now, so the lifecycle of cell in 
prevCellList for FilterList will be quite shorter. so just use the cell ref for 
saving cpu.
BTW, we removed all the arrays streams usage in filter list, because it's also 
quite time-consuming in our test.

> Some optimization in FilterListWithOR
> -------------------------------------
>
>                 Key: HBASE-21734
>                 URL: https://issues.apache.org/jira/browse/HBASE-21734
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.1.3, 2.0.5
>
>         Attachments: HBASE-21734.branch-1.v1.patch, HBASE-21734.v1.patch, 
> columnkey.txt, perf-ut.patch
>
>
> In HBASE-21620,   [~KarthickRam] and [~mohamed.meeran]  complaind that their 
> performance of filter list has been degraded after that patch in here [1].  
> I wrote a UT for this, and test under my host.  It's true.   I gussed there 
> may be two reasons: 
> 1.  the comparator.compare(nextKV, cell) > 0 StoreScanner; 
> 2.  the filter list concated by OR will choose the minimal forward step among 
> all sub-filters. in this patch, we have stricter restrictions on all sub 
> filters, include those sub-filter whose has non-null RC returned in 
> calculateReturnCodeByPrevCellAndRC (previously, we will skip to merge this 
> sub-filter's rc, but it's wrong in some case), and merge all of the 
> sub-filter's RC, this is also some time cost.
> The former one seems not the main problem, because the UT still cost ~ 3s 
> even if I comment the compare.  the second one has some impact indeed, 
> because after i skip to merge the sub-filters's RC if 
> calculateReturnCodeByPrevCellAndRC return a non-null rc,  the UT cost ~ 1s,  
> it's improvement but the logic is not wrong.
> 1. 
> https://issues.apache.org/jira/browse/HBASE-21620?focusedCommentId=16737100&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16737100



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21734) Some optimization in FilterListWithOR

Reply via email to