You can configure how many versions of a cell hbase should keep when you set
up your table's schema.  For example, if you set your table to only keep 3
versions, then on the next major compaction (default every 24 hours),
versions in excess of 3 will be let go.
St.Ack


On Mon, Sep 14, 2009 at 2:32 AM, Chen Xinli <[email protected]> wrote:

> Hi ,
>
> As there is only insertion in hbase, how does hbase clean garbage data?
>
> I will have a table storing several hundred million webpages, updation is
> done for several million pages per day. Will there be any problem?
>
> Thanks
>
>
>
> 2009/9/3 Jonathan Gray <[email protected]>
>
> > Kevin,
> >
> > Not sure I follow the use case 100% but I think you're on the right
> track.
> >  There are no UPDATES or mutations of any kind in HBase, only INSERTS.  A
> > delete is actually the insertion of a DELETE record.
> >
> > One thing to be cautious of... There can be indeterminate behavior if you
> > are manually setting the version timestamps of your cells while doing
> > row/family deletes.  If you don't manually set the timestamp (you have
> stamp
> > in the key so I'm thinking you don't), then you don't need to worry about
> > it.
> >
> > JG
> >
> >
> > Kevin Peterson wrote:
> >
> >> I think that it is not possible change the primary key of a row, and I
> >> need
> >> to copy any data I want over to a row with the new key and then delete
> the
> >> old one, but I wanted to check.
> >>
> >> I'm planning on creating my table storing spidered blog content building
> >> the
> >> primary key from the timestamp of when an article was posted and our
> >> unique
> >> article key. This seems the right approach because it matches our access
> >> pattern when processing large amounts of data. The reason I need to be
> >> able
> >> to change the primary key is when we get an item from multiple sources
> >> (i.e.
> >> maybe we picked it up from digg and directly from the RSS feed) we don't
> >> always favor the first one we downloaded and sometimes we see different
> >> dates.
> >>
> >> Does deleting the row and reinserting sound like the right approach?
> >>
> >> (If it matters, I'm playing with 0.20 RC2 right now.)
> >>
> >>
>
>
> --
> Best Regards,
> Chen Xinli
>

Reply via email to