I've opened a HBASE-1190 for it. Looking through the other code, it seems the pattern is to wrap a StopRowFilter in a WhileMatchRowFilter so that it will filterAllRemaining once it hits the stop row, so I've submitted a patch to do that. It does seem, however, like the StopRowFilter should know to filterAllRemaining itself once the stop row is reached, and not require a WhileMatchRowFilter.
Dave On Sat, Feb 7, 2009 at 1:21 PM, stack <[email protected]> wrote: > On Wed, Feb 4, 2009 at 4:09 PM, Dave Latham <[email protected]> wrote: > > > In order to speed up a map reduce job operating on HBase input data, we > > recently added a RowFilter to the input format. However, when trying to > > execute it, map tasks (one per region) that used to take 1-2 minutes > began > > timing out after 10 minutes. So I dug in to TableInputFormatBase to see > > how > > it handles a row filter, and it appears to take out filter and combine it > > with a StopRowFilter in order to scan the proper split, since there is no > > getScanner method that can accept both a stop row and a row filter. > > Digging > > further in to the scanning / filtering, it looks like it continues > scanning > > filterAllRemaining returns true. However, > > StopRowFilter.filterAllRemaining() always returns false. So if my > > understanding is correct, every split in this task will end up scanning > to > > the end of the table and testing every row with the filter instead of > > simply > > stopping at the end of it's given split. That would explain why my map > > tasks began taking longer (instead of shorter). > > > > 1. Is my understanding correct? (aka is this a bug? If so, I don't see > an > > existing JIRA issue for it -- I can open one if no one else does.) > > > Sounds like a bug (and an explanation for long-running jobs) but, IIUC, > stop > row filter supposed to have a 'stop row' embedded and once filter passes it > out, then we stop filltering? If thats not going on, lets fix it. > > St.Ack > P.S. Thanks for digging in. >
