I've read numerous threads on this mailing list and I've asked several times on IRC but the answers I get are rarely the same so I'd like to try once more.
I have a data model that would be a perfect match for the versions/timestamps that are available in HBase. Some say that it is perfectly feasible to use the versions as another "data dimension" and some say that it isn't meant to be used that way at all. The BigTable paper doesn't go into very much detail about this but from what I gathered it is indeed used as an additional dimension. In my data model the versions would start at 1 and be ascending - no timestamps but HBase doesn't enforce those. The upside of this model would be that only the difference between two versions would have to be saved and that I'd be provided with a nice API to handle versions. The model proposed to me numerous times using a compound row key (model id:version) would save duplicates of the data (or I'd have to handle the diffs myself). Another upside would be that it would require only a Get to get an element and its history. I require "out of order" insertion to the versions and I was told that this is probably no problem as long as I don't delete a version. Is this true? I know that there is a limit for versions (Integer.MAX_VALUE as far as I can see) and for some of my tables this will be a problem so I'd end up using a mix of both these models anyway but if possible I'd like to use the version model provided by HBase where I can. I haven't seen a single example schema, tutorial, ... that talks about the versions in schemas; they seem to go mainly unused. So my question would be: Should I use versions as an important part of my schema or not? If not are there any tips/hints on management of versions using compound keys and what the versions/timestamps are used for if not as an additional data dimension? And one more question about a "proper" schema: I have quite a lot of places that merely save a list of things it relates to without requiring any additional information (Many-to-Many). I'd have introduced a new column family and used the columns as keys to another table but I won't need the column value. How does HBase behave in regard to "null" as a column value? The FAQ entry about this topic is a bit unclear. Or is this the wrong way to begin with? Thanks for your help! Lars
