And if you go with the time stamp there is an option issue to deal with this problem
HBASE-1170

If you have a set time you want to keep the data then there is always the ttl option on the tables columns.

Billy


"stack" <[email protected]> wrote in message news:[email protected]...
I think time as part of the row key will be a fairly common practise; if it
suits your access pattern, go for it.

Regards how to get rid of all rows inserted three months ago, since your
keys have timestamp embedded, can you not scan your table deleting all
timestamps older than 3months?   Or, alter your table adding a timeout on
the column of 3 months and then bring your table back on line. At the next
major compaction, once a day if default, cells older than 3 months will be
deleted.

St.Ack

On Tue, Mar 3, 2009 at 9:33 AM, schubert zhang <[email protected]> wrote:

In my practice, I define the 'time' as the first part of rowkey, then I can
only process the newly added rows.
I think my practice is not good and not appropriate for other cases, since
the rowkey definition is so important.
And I also want to know any good ideas.

Another question is, how can I remove all rows which are inserted three
months ago?

On Wed, Mar 4, 2009 at 12:45 AM, Slava Gorelik <[email protected]
>wrote:

> Hi.I have a small question about MR jobs. Is it possible to run MR job > on
> part of the table ?
> For example I have MR job running on table and next time when run this
> job, I want to get only newly added or updated rows.
>
> Thank You and Best Regards.
>




Reply via email to