[ 
https://issues.apache.org/jira/browse/HBASE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Latham resolved HBASE-1652.
--------------------------------

    Resolution: Won't Fix

Scan's have stop rows as of HBase 0.20, so the StopRowFilter is no longer 
needed.

> Scanners for sparse column not stopped by StopRowFilter
> -------------------------------------------------------
>
>                 Key: HBASE-1652
>                 URL: https://issues.apache.org/jira/browse/HBASE-1652
>             Project: HBase
>          Issue Type: Bug
>          Components: filters, regionserver
>    Affects Versions: 0.19.3
>            Reporter: Dave Latham
>
> Scanning a sparse column over a narrow range of rows can take far longer than 
> expected because the check for the end of the range is not performed on new 
> rows unless there is a column match, so it may end up scanning an entire 
> region or table.
> Background:
> I have a table with 1 billion+ rows, and one cell in each row, generally 
> small (10-1000 bytes).  The columns are all in a single family and fairly 
> sparse.  For one query, I run scans on it to scan usually a narrow range of 
> the table for the first 30 cells ina certain column.  I know that all the 
> rows that contain that column lie within a certain range.  I use 
> HTable.getScanner(byte[][] columns, byte[] startRow, RowFilterInterface 
> filter) passing it the particular column I'm looking for, a startRow, and a 
> filter set containing a StopRowFilter wrapped in a WhileMatchRowFilter to 
> enforce the end of the range.  Sometimes the query is very fast (< 1 sec), 
> but if the table doesn't contain 30 rows with that column, it can be very 
> slow, a minute or two.  I expected that since the range was small, for 
> example, just 120 rows, the query wouldn't take long to scan the rows.
> After some pondering and perusing of the source code, I think I understand 
> what is going on.  It looks like the Scanner is scanning the rest of the 
> table to find rows containing the column without allowing the StopRowFilter 
> to stop the scan at the end of the range.  I think I can work around this by 
> not specifying the column I want in the getScanner() method and instead 
> putting an additional filter in the filter set to filter out other columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to