Thanks Jonathan, that advice is helpful. I've seen 0.20 mentioned a few times on the list - is this a reference to current SVN HEAD, and if so is it considered sufficiently stable to be deployable?
-- Jon Schutz My tech notes http://notes.jschutz.net Chief Technology Officer http://www.youramigo.com YourAmigo Jonathan Gray wrote: > Jon, > > Prior to 0.20, I would definitely recommend moving the time component to > the keys, columns, and values. Even after 0.20, I recommend doing that > if you want complete control. My personal philosophy is that versions > are for versioning, and if you are really using them as a time dimension > of individual data points, you should consider not using versions. > > However, the API and server-side implementation for versions is greatly > improved. You can specify stamps manually and you can query for any > range you want, gets and scans. > > There is not currently a way to keep versions < x weeks old but always > keep the latest version. If you wanted to enforce something like that, > you could always write a MapReduce job that ran periodically and > enforced what you wanted. > > If you want to keep history forever, the idea is to use the "big enough" > values. In practice, only since HBase 0.20 have we been able to handle > millions of versions of a single column (Integer.MAX_VALUE is >2 > billion, far beyond the capabilities of HBase). The same goes for > TTL... 2 billion seconds is over 60 years. Could also move everything > to Long which would ensure there would never be an issue. Will dig more > and let you know. > > In any case, you'll need 0.20 to fully take advantage of versions. > > Hope that helps. > > JG > > Jon Schutz wrote: >> How do TTL and Versions specifications interact? I'm guessing that the >> first limit reached applies, i.e. if TTL is 1 week and versions is 3, >> adding a fourth update to a data record would cause the first to be >> bumped even if it is less than a week old? And if I only have 2 >> versions but one is 2 weeks old, the expired one gets bumped even though >> the versions limit has not been reached? >> >> Is there a way to say "Keep versions < x weeks old, but always keep at >> least the latest version, no matter how old?" >> >> Suppose I want to keep the history about a particular object forever. >> Looks like TTL can be set to 'Forever' (-1) but Versions has no >> 'infinite' setting - I guess that's OK as in practice MAXINT is "big >> enough". Would it be wise to use Hbase like this to maintain a history, >> or should I be adding a time component into the key and storing multiple >> records? Can anyone help outline the pros and cons? >> >> Thanks, >> >> >
