Tom,
HBase is certainly capable of doing something like this. And I'm
currently doing things like it in production.
As a binary store, you can use any kind of serialized type you want (we
store everything from json and protobufs to serialized java, python, and
erlang data structures). That often includes enforcement of type,
required/optional fields, length, etc...
What you're asking is if there is a way to integrate this more directly,
with custom serializers/deserializers that would do enforcement?
What I will say is the new design for 0.20 that we are currently testing
includes a rework of the client/server protocol towards something more
language-agnostic (likely not fully there for 0.20 but soon after
hopefully). Even for 0.20 though, for PUTS, the actually binary that will
eventually be stored into HFiles (called a KeyValue) is being built
client-side and sent to the server. GETS will return the same thing. In
both cases, what you basically send between the client and server are
lists of KeyValues, these can then be built into existing structures like
RowResult/Cell or interpreted in any way you'd like.
That basically means you can do anything you want as far as the
serialization/deserialization goes of what you're storing in HBase.
I've not really found a need to further integrate typed information... But
I also have no problem adding complexity to the app level, that's a
decision made long ago and it's what has allowed us to do so much with
HBase.
Putting flexibility into the hands of the client seems like a good way to
go, keeping HBase as simple as possible ("just" a KeyValue store).
JG
On Fri, April 17, 2009 8:21 am, Tom Nichols wrote:
> Hi,
>
>
> I've been using HBase and now I'm looking at Cassandra. What's
> particularly interesting about Cassandra is its typed data model.
> Apparently it involves JSON, but what matters the most to me is that
> it makes storage of complex data types much easier. It is described here:
> http://project-voldemort.com/design.php about half-way down the
> page. Obviously JSON serialization & deserialization adds overhead but the
> ability to choose a strongly-typed storage format seems nice.
>
> Any thoughts of this functionality in the HBase API? Not necessarily
> JSON in particular, but a pluggable serialization/ deserialization
> mechanism? I imagine this could be done completely on the client, but
> having something standard so every user doesn't have to roll their own
> (and having the same functionality in HB/MapReduce) would be nice.
>
>
> Thanks.
> -Tom
>
>
>