On Mon, Jul 13, 2009 at 9:11 PM, Dave Latham <[email protected]> wrote:

>
> After some pondering and perusing of the source code, I think I understand
> what is going on.  It looks like the Scanner is scanning the rest of the
> table to find rows containing the column without allowing the StopRowFilter
> to stop the scan at the end of the range.  I think I can work around this
> by
> not specifying the column I want in the getScanner() method and instead
> putting an additional filter in the filter set to filter out other columns,
> but I had some questions.
>
> 1.  Is my understanding correct?


Sounds plausible Dave.  StopRowFilter is activated on call to filterRowKey
up in StoreScanner, the aggregator of Memcache and StoreFile Scanners.  If
no column match, store file or memcache scanners won't let a row up to
StoreScanner level for filterRowKey to act on.



> 2.  Is there a better way to get the behavior I'm looking for?



Well, I think this a bug.

One fix might be to pass the filterRow down into each of the individual
scanners, down into memcache and storefile scanners.



>
> 3.  Has this behavior changed in 0.20 or will scanners for a given column
> still go on through an entire table without being stopped by a filter?



Looking, it looks like TRUNK has same issue.  We peek a queue of Scanners.
Unless row has a column, the row won't come out of the peek.



>
> 4.  Would it make sense for HBase scanners to support filters that can
> terminate a scanner sooner?



At least for the first filter step of filterRow, yes.  Please file an issue
Dave.
St.Ack

Reply via email to