Hi , As there is only insertion in hbase, how does hbase clean garbage data?
I will have a table storing several hundred million webpages, updation is done for several million pages per day. Will there be any problem? Thanks 2009/9/3 Jonathan Gray <[email protected]> > Kevin, > > Not sure I follow the use case 100% but I think you're on the right track. > There are no UPDATES or mutations of any kind in HBase, only INSERTS. A > delete is actually the insertion of a DELETE record. > > One thing to be cautious of... There can be indeterminate behavior if you > are manually setting the version timestamps of your cells while doing > row/family deletes. If you don't manually set the timestamp (you have stamp > in the key so I'm thinking you don't), then you don't need to worry about > it. > > JG > > > Kevin Peterson wrote: > >> I think that it is not possible change the primary key of a row, and I >> need >> to copy any data I want over to a row with the new key and then delete the >> old one, but I wanted to check. >> >> I'm planning on creating my table storing spidered blog content building >> the >> primary key from the timestamp of when an article was posted and our >> unique >> article key. This seems the right approach because it matches our access >> pattern when processing large amounts of data. The reason I need to be >> able >> to change the primary key is when we get an item from multiple sources >> (i.e. >> maybe we picked it up from digg and directly from the RSS feed) we don't >> always favor the first one we downloaded and sometimes we see different >> dates. >> >> Does deleting the row and reinserting sound like the right approach? >> >> (If it matters, I'm playing with 0.20 RC2 right now.) >> >> -- Best Regards, Chen Xinli
