[jira] [Comment Edited] (HBASE-22448) Scan is slow for Multiple Column prefixes

Zheng Hu (JIRA) Wed, 22 May 2019 20:08:25 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846396#comment-16846396
 ]


Zheng Hu edited comment on HBASE-22448 at 5/23/19 3:07 AM:
-----------------------------------------------------------

To show the desgin, I draw a picture (see 
[^filter-list-with-or-internal-2.png]) for this FilterListWithOR and why this 
case is time-consuming, assume we have a FilterList and all the subFilters are 
ColumnPrefixFilter:
{code:java}
filterList = subFilter1  OR subFilter2 OR subFilter3 
{code}
For cell-0, subFilter1#nextHint is cell-6 , subFilter2#nextHint is cell-1, 
subFilter3#nextHint is cell-7, then the filterList will choose the minimal next 
hint between all those subFilters , which will be cell-1. That's correct.
 For cell-1, sbufFilter1 need a comparison to see whether current cell has 
reached its previous next hint cell ( cell-6). also the subFilter3 need the 
comparision..
 For cell-2. both the subFilter1 and subFilter2 need the same comparision as 
cell-1 did.
 ...

All the black dots means the extra comparisions compared branch-1.2*, while it 
was designed in a wrong way for branch-1.2 so seems more efficient.


was (Author: openinx):
To show the desgin, I  draw a picture (see 
https://issues.apache.org/jira/secure/attachment/12969462/filter-list-with-or-internal-2.png)
 for this FilterListWithOR and why this case is time-consuming, assume we have 
a FilterList and the subFilter is a ColumnPrefixFilter: 
{code}
filterList = subFilter1  OR subFilter2 OR subFilter3 
{code}
For cell-0,    subFilter1#nextHint is cell-6 ,  subFilter2#nextHint is cell-1,  
subFilter3#nextHint is cell-7, then the filterList will choose the minimal next 
hint between all those subFilters , which will be cell-1.  That's correct.
For cell-1,    sbufFilter1 need a comparison to see whether current cell has 
reached its previous next hint cell ( cell-6).  also the subFilter3 need the 
comparision..
For cell-2.   both the subFilter1 and subFilter2 need the same comparision as 
cell-1 did.
...

All the black dots means the extra comparisions compared branch-1.2*,  while it 
was designed in a wrong way for branch-1.2 so seems more efficient. 



> Scan is slow for Multiple Column prefixes
> -----------------------------------------
>
>                 Key: HBASE-22448
>                 URL: https://issues.apache.org/jira/browse/HBASE-22448
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners
>    Affects Versions: 1.4.8, 1.4.9
>            Reporter: Karthick
>            Assignee: Zheng Hu
>            Priority: Critical
>              Labels: prefix, scan, scanner
>             Fix For: 1.5.0, 1.4.10
>
>         Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, 
> filter-list-with-or-internal-2.png, 
> org.apache.hadoop.hbase.filter.TestSlowColumnPrefix-output.zip, 
> qualifiers.txt, scanquery.txt
>
>
> While scanning a row (around 10 lakhs columns) with 100 column prefixes, it 
> takes around 4 seconds in hbase-1.2.5 and when the same query is executed in 
> hbase-1.4.9 it takes around 50 seconds.
> Is there any way to optimise this?
>  
> *P.S:*
> We have applied the patch provided in 
> [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and  
> [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached 
> *qualifiers*.*txt* file which contains the column keys. Use the 
> *HBaseFileImport.java* file provided to populate in your table and use 
> *scanquery.txt* to query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-22448) Scan is slow for Multiple Column prefixes

Reply via email to