I thought about protobufs - That's probably the most straightforward since it is easily convertible to and from byte arrays. I guess what I'm thinking about is a ''standard" serialization mechanism so I can put a pretty face on the HBase data access API without having to do the serialization and deserialization myself. I I'm probably just being lazy :)
On Fri, Apr 17, 2009 at 1:05 PM, Jonathan Gray <[email protected]> wrote: > Tom, > > HBase is certainly capable of doing something like this. And I'm > currently doing things like it in production. > > As a binary store, you can use any kind of serialized type you want (we > store everything from json and protobufs to serialized java, python, and > erlang data structures). That often includes enforcement of type, > required/optional fields, length, etc... > > What you're asking is if there is a way to integrate this more directly, > with custom serializers/deserializers that would do enforcement? > > What I will say is the new design for 0.20 that we are currently testing > includes a rework of the client/server protocol towards something more > language-agnostic (likely not fully there for 0.20 but soon after > hopefully). Even for 0.20 though, for PUTS, the actually binary that will > eventually be stored into HFiles (called a KeyValue) is being built > client-side and sent to the server. GETS will return the same thing. In > both cases, what you basically send between the client and server are > lists of KeyValues, these can then be built into existing structures like > RowResult/Cell or interpreted in any way you'd like. > > That basically means you can do anything you want as far as the > serialization/deserialization goes of what you're storing in HBase. > > I've not really found a need to further integrate typed information... But > I also have no problem adding complexity to the app level, that's a > decision made long ago and it's what has allowed us to do so much with > HBase. > > Putting flexibility into the hands of the client seems like a good way to > go, keeping HBase as simple as possible ("just" a KeyValue store). > > JG > > > On Fri, April 17, 2009 8:21 am, Tom Nichols wrote: >> Hi, >> >> >> I've been using HBase and now I'm looking at Cassandra. What's >> particularly interesting about Cassandra is its typed data model. >> Apparently it involves JSON, but what matters the most to me is that >> it makes storage of complex data types much easier. It is described here: >> http://project-voldemort.com/design.php about half-way down the >> page. Obviously JSON serialization & deserialization adds overhead but the >> ability to choose a strongly-typed storage format seems nice. >> >> Any thoughts of this functionality in the HBase API? Not necessarily >> JSON in particular, but a pluggable serialization/ deserialization >> mechanism? I imagine this could be done completely on the client, but >> having something standard so every user doesn't have to roll their own >> (and having the same functionality in HB/MapReduce) would be nice. >> >> >> Thanks. >> -Tom >> >> >> > >
