In my job, I can tell the MR job the startRow and endRow, i.e. a row range. Then my MR job can only scan the region(s) in the range, and should not scan from begin of table or tablet/region to the end.
So, Slava, you should modify you code of MR job to do what you want. Schubert On Wed, Mar 4, 2009 at 4:58 PM, Slava Gorelik <[email protected]>wrote: > Hi.I'm confused a little bit. > > Please correct me if I wrong, but MR Job is it self is "scanning" all rows > in the table. The job is spread into each region server, into > multiple threads. Each thread get some part of the rows that are placed in > particular region server. So, the MR jobs is finished when all > threads are passed over all rows. Filtering will help the MR job only to > filter out non-relevant rows, but any way those rows will be checked > (passed > to the filter), this not helps a lot, job still passing over all rows in > the > table. Calling a scanner inside MR Job, will not > prevent from the job to pass over all rows, it simple will make job > more heavy(as i understand that). Is it correct, Michael ? > > So, my question is how can I tell to MR Job to pass over some rows and not > all rows. > > Thank You and Best Regards. > Slava. > > > On Wed, Mar 4, 2009 at 8:57 AM, stack <[email protected]> wrote: > > > On Tue, Mar 3, 2009 at 6:17 PM, schubert zhang <[email protected]> > wrote: > > > > > Yes, we can tell HBase API only scan rows start with a key. > > > > > > > Would the filtering feature help here? > > > > > > > http://hadoop.apache.org/hbase/docs/r0.19.0/api/org/apache/hadoop/hbase/filter/package-summary.html#package_description > > > > Scanners can be passed a filter (Read the description section on the > above > > url). > > > > > > Can any expert share your ideas about: > > > 1. If the rowkey is not chronological, how can I only process the newly > > > added/updated rows? > > > > > > We don't have a means of asking for versions before a timestamp, only > older > > (Can you add timestamp to your row key if you need this?) > > > > > > > 2. How can I remove the old rows which are inserted three months ago? > > > > > > > See above. > > > > St.Ack > > >
