Get it... Major compaction does the cleanups. Thanks
2009/9/14 stack <[email protected]> > You can configure how many versions of a cell hbase should keep when you > set > up your table's schema. For example, if you set your table to only keep 3 > versions, then on the next major compaction (default every 24 hours), > versions in excess of 3 will be let go. > St.Ack > > > On Mon, Sep 14, 2009 at 2:32 AM, Chen Xinli <[email protected]> wrote: > > > Hi , > > > > As there is only insertion in hbase, how does hbase clean garbage data? > > > > I will have a table storing several hundred million webpages, updation is > > done for several million pages per day. Will there be any problem? > > > > Thanks > > > > > > > > 2009/9/3 Jonathan Gray <[email protected]> > > > > > Kevin, > > > > > > Not sure I follow the use case 100% but I think you're on the right > > track. > > > There are no UPDATES or mutations of any kind in HBase, only INSERTS. > A > > > delete is actually the insertion of a DELETE record. > > > > > > One thing to be cautious of... There can be indeterminate behavior if > you > > > are manually setting the version timestamps of your cells while doing > > > row/family deletes. If you don't manually set the timestamp (you have > > stamp > > > in the key so I'm thinking you don't), then you don't need to worry > about > > > it. > > > > > > JG > > > > > > > > > Kevin Peterson wrote: > > > > > >> I think that it is not possible change the primary key of a row, and I > > >> need > > >> to copy any data I want over to a row with the new key and then delete > > the > > >> old one, but I wanted to check. > > >> > > >> I'm planning on creating my table storing spidered blog content > building > > >> the > > >> primary key from the timestamp of when an article was posted and our > > >> unique > > >> article key. This seems the right approach because it matches our > access > > >> pattern when processing large amounts of data. The reason I need to be > > >> able > > >> to change the primary key is when we get an item from multiple sources > > >> (i.e. > > >> maybe we picked it up from digg and directly from the RSS feed) we > don't > > >> always favor the first one we downloaded and sometimes we see > different > > >> dates. > > >> > > >> Does deleting the row and reinserting sound like the right approach? > > >> > > >> (If it matters, I'm playing with 0.20 RC2 right now.) > > >> > > >> > > > > > > -- > > Best Regards, > > Chen Xinli > > > -- Best Regards, Chen Xinli
