I've read numerous threads on this mailing list and I've asked several
times on IRC but the answers I get are rarely the same so I'd like to
try once more.

I have a data model that would be a perfect match for the
versions/timestamps that are available in HBase. Some say that it is
perfectly feasible to use the versions as another "data dimension" and
some say that it isn't meant to be used that way at all. The BigTable
paper doesn't go into very much detail about this but from what I
gathered it is indeed used as an additional dimension.

In my data model the versions would start at 1 and be ascending - no
timestamps but HBase doesn't enforce those. The upside of this model
would be that only the difference between two versions would have to
be saved and that I'd be provided with a nice API to handle versions.
The model proposed to me numerous times using a compound row key
(model id:version) would save duplicates of the data (or I'd have to
handle the diffs myself). Another upside would be that it would
require only a Get to get an element and its history.

I require "out of order" insertion to the versions and I was told that
this is probably no problem as long as I don't delete a version. Is
this true?
I know that there is a limit for versions (Integer.MAX_VALUE as far as
I can see) and for some of my tables this will be a problem so I'd end
up using a mix of both these models anyway but if possible I'd like to
use the version model provided by HBase where I can. I haven't seen a
single example schema, tutorial, ... that talks about the versions in
schemas; they seem to go mainly unused.

So my question would be: Should I use versions as an important part of
my schema or not? If not are there any tips/hints on management of
versions using compound keys and what the versions/timestamps are used
for if not as an additional data dimension?

And one more question about a "proper" schema: I have quite a lot of
places that merely save a list of things it relates to without
requiring any additional information (Many-to-Many). I'd have
introduced a new column family and used the columns as keys to another
table but I won't need the column value. How does HBase behave in
regard to "null" as a column value? The FAQ entry about this topic is
a bit unclear. Or is this the wrong way to begin with?

Thanks for your help!

Lars

Reply via email to