And let me follow up a bit... The best configuration for a m-r job is to have the # of map tasks = # of regions in the table. While a scanner can iterate between regions, once the table size gets really big, it's best in my experience, more reliable as well, to have a 1:1 correspondence between map tasks and regions.
-ryan On Mon, Jun 15, 2009 at 1:55 AM, Ryan Rawson <[email protected]> wrote: > Hey, > > The client-side scanner code already will move it to the next region when > it hits the end of a region. > > -ryan > > > > On Mon, Jun 15, 2009 at 1:52 AM, Piotr Praczyk <[email protected]>wrote: > >> 2009/6/12 stack <[email protected]> >> >> > On Fri, Jun 12, 2009 at 8:41 AM, Erik Holstad <[email protected]> >> > wrote: >> > >> > > ... >> > > not really sure how this >> > > was done in 0.19 and earlier. >> > >> > >> > There's a stoprow filter in 0.19.x and earlier. There is also a >> getScanner >> > override that takes a start and stop row in 0.19.x (under the wraps it >> uses >> > stop row filter -- check the client source). >> > St>Ack >> > >> >> Thanks :-) It was very helpful. >> Do you know if there is any standard Scanner allowing to iterate over more >> than one table fragments ? [when one chunk finishes, jumping to the >> beginning of another] Or rather should I implement it myself ? >> >> >> Piotr >> > >
