[jira] [Commented] (HBASE-22448) Scan is slow for Multiple Column prefixes

Zheng Hu (JIRA) Wed, 22 May 2019 23:57:44 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846481#comment-16846481
 ]


Zheng Hu commented on HBASE-22448:
----------------------------------

Hi,  [~Karthick],   There's a very nice suggestion from [~anoop.hbase]:  you 
can try to use the MultipleColumnPrefixFilter rather than use the complex 
FilterList.  I rewrite the filter by the following: 
{code}
    List<byte[]> prefixes = new ArrayList<>();
    for (String prefix : WORDS) {
      prefixes.add(Bytes.toBytes(prefix));
    }
    MultipleColumnPrefixFilter filter =
        new MultipleColumnPrefixFilter(prefixes.toArray(new byte[0][]));
    // Add filter
    Scan scan = new Scan();
    scan.setFilter(filter);
    try (ResultScanner scanner = table.getScanner(scan)) {
      long startTime = System.currentTimeMillis();
      int count = 0;
      for (Result r : scanner) {
        count++;
      }
      LOG.info("Total time consumed: {} (ms). count: {}", 
System.currentTimeMillis() - startTime,
        count);
    }
{code}

And found that it cost < 400ms,  it's a well-optimized fitler and you can use 
it. 
{code}
2019-05-23 14:48:10,458 INFO  [main] regionserver.TestScanBenchmark(98): Total 
time consumed: 329 (ms). count: 1
{code}

So I plan to resolve this issue as won't fix,   please feel free to reopen it 
or comment if any other concerns.  Thanks [~anoop.hbase], [~ram_krish]. 

> Scan is slow for Multiple Column prefixes
> -----------------------------------------
>
>                 Key: HBASE-22448
>                 URL: https://issues.apache.org/jira/browse/HBASE-22448
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners
>    Affects Versions: 1.4.8, 1.4.9
>            Reporter: Karthick
>            Assignee: Zheng Hu
>            Priority: Critical
>              Labels: prefix, scan, scanner
>             Fix For: 1.5.0, 1.4.10
>
>         Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, 
> filter-list-with-or-internal-2.png, 
> org.apache.hadoop.hbase.filter.TestSlowColumnPrefix-output.zip, 
> qualifiers.txt, scanquery.txt
>
>
> While scanning a row (around 10 lakhs columns) with 100 column prefixes, it 
> takes around 4 seconds in hbase-1.2.5 and when the same query is executed in 
> hbase-1.4.9 it takes around 50 seconds.
> Is there any way to optimise this?
>  
> *P.S:*
> We have applied the patch provided in 
> [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and  
> [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached 
> *qualifiers*.*txt* file which contains the column keys. Use the 
> *HBaseFileImport.java* file provided to populate in your table and use 
> *scanquery.txt* to query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-22448) Scan is slow for Multiple Column prefixes

Reply via email to