Yes, we can tell HBase API only scan rows start with a key. // get rows start from startRow to table end HTable.getScanner(final byte[][] columns, final byte [] startRow)
// get rows start from startRow to table end, only the cells time stamp <= timestamp are retrieved HTable.getScanner(final byte[][] columns, final byte [] startRow, long timestamp) // get row range [startRow, endRow ) HTable.getScanner(final byte [][] columns, final byte [] startRow, final byte [] stopRow) // get row range [startRow, endRow ), only the cells time stamp <= timestamp are retrieved HTable.getScanner(final byte [][] columns, final byte [] startRow, final byte [] stopRow, final long timestamp) Can any expert share your ideas about: 1. If the rowkey is not chronological, how can I only process the newly added/updated rows? 2. How can I remove the old rows which are inserted three months ago? Schubert On Wed, Mar 4, 2009 at 3:10 AM, Slava Gorelik <[email protected]>wrote: > Thank You for the answer.How can you tell to MR jobs which rows you want to > get ? Is it possible to tell to MR Job give me only rows that starts with > some key ? > > Best Regards. > Slava > > On Tue, Mar 3, 2009 at 7:33 PM, schubert zhang <[email protected]> wrote: > > > In my practice, I define the 'time' as the first part of rowkey, then I > can > > only process the newly added rows. > > I think my practice is not good and not appropriate for other cases, > since > > the rowkey definition is so important. > > And I also want to know any good ideas. > > > > Another question is, how can I remove all rows which are inserted three > > months ago? > > > > On Wed, Mar 4, 2009 at 12:45 AM, Slava Gorelik <[email protected] > > >wrote: > > - Show quoted text - > > > > > Hi.I have a small question about MR jobs. Is it possible to run MR job > on > > > part of the table ? > > > For example I have MR job running on table and next time when run this > > > job, I want to get only newly added or updated rows. > > > > > > Thank You and Best Regards. > > > > > >
