You can configure how many versions of a cell hbase should keep when you set up your table's schema. For example, if you set your table to only keep 3 versions, then on the next major compaction (default every 24 hours), versions in excess of 3 will be let go. St.Ack
On Mon, Sep 14, 2009 at 2:32 AM, Chen Xinli <[email protected]> wrote: > Hi , > > As there is only insertion in hbase, how does hbase clean garbage data? > > I will have a table storing several hundred million webpages, updation is > done for several million pages per day. Will there be any problem? > > Thanks > > > > 2009/9/3 Jonathan Gray <[email protected]> > > > Kevin, > > > > Not sure I follow the use case 100% but I think you're on the right > track. > > There are no UPDATES or mutations of any kind in HBase, only INSERTS. A > > delete is actually the insertion of a DELETE record. > > > > One thing to be cautious of... There can be indeterminate behavior if you > > are manually setting the version timestamps of your cells while doing > > row/family deletes. If you don't manually set the timestamp (you have > stamp > > in the key so I'm thinking you don't), then you don't need to worry about > > it. > > > > JG > > > > > > Kevin Peterson wrote: > > > >> I think that it is not possible change the primary key of a row, and I > >> need > >> to copy any data I want over to a row with the new key and then delete > the > >> old one, but I wanted to check. > >> > >> I'm planning on creating my table storing spidered blog content building > >> the > >> primary key from the timestamp of when an article was posted and our > >> unique > >> article key. This seems the right approach because it matches our access > >> pattern when processing large amounts of data. The reason I need to be > >> able > >> to change the primary key is when we get an item from multiple sources > >> (i.e. > >> maybe we picked it up from digg and directly from the RSS feed) we don't > >> always favor the first one we downloaded and sometimes we see different > >> dates. > >> > >> Does deleting the row and reinserting sound like the right approach? > >> > >> (If it matters, I'm playing with 0.20 RC2 right now.) > >> > >> > > > -- > Best Regards, > Chen Xinli >
