[
https://issues.apache.org/jira/browse/HBASE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dave Latham resolved HBASE-1652.
--------------------------------
Resolution: Won't Fix
Scan's have stop rows as of HBase 0.20, so the StopRowFilter is no longer
needed.
> Scanners for sparse column not stopped by StopRowFilter
> -------------------------------------------------------
>
> Key: HBASE-1652
> URL: https://issues.apache.org/jira/browse/HBASE-1652
> Project: HBase
> Issue Type: Bug
> Components: filters, regionserver
> Affects Versions: 0.19.3
> Reporter: Dave Latham
>
> Scanning a sparse column over a narrow range of rows can take far longer than
> expected because the check for the end of the range is not performed on new
> rows unless there is a column match, so it may end up scanning an entire
> region or table.
> Background:
> I have a table with 1 billion+ rows, and one cell in each row, generally
> small (10-1000 bytes). The columns are all in a single family and fairly
> sparse. For one query, I run scans on it to scan usually a narrow range of
> the table for the first 30 cells ina certain column. I know that all the
> rows that contain that column lie within a certain range. I use
> HTable.getScanner(byte[][] columns, byte[] startRow, RowFilterInterface
> filter) passing it the particular column I'm looking for, a startRow, and a
> filter set containing a StopRowFilter wrapped in a WhileMatchRowFilter to
> enforce the end of the range. Sometimes the query is very fast (< 1 sec),
> but if the table doesn't contain 30 rows with that column, it can be very
> slow, a minute or two. I expected that since the range was small, for
> example, just 120 rows, the query wouldn't take long to scan the rows.
> After some pondering and perusing of the source code, I think I understand
> what is going on. It looks like the Scanner is scanning the rest of the
> table to find rows containing the column without allowing the StopRowFilter
> to stop the scan at the end of the range. I think I can work around this by
> not specifying the column I want in the getScanner() method and instead
> putting an additional filter in the filter set to filter out other columns.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.