IMO, typing or structuring the cell content belongs in the layer above
hbase.

As I see it, we could chose an explicit representation but then applications
that wanted to use an alternate would be paying a tax for the serialization
dissonance.  Those who wanted to persist binary would see their data bloat;
e.g. reformat as json with base64'ing to be safe.

I don't think it would be too hard adding a generalized system atop hbase
that would allow use of protobuf, json, thrift or avro, etc.  The tools are
there to do such a thing, I believe.  The column descriptor -- and maybe
even the table descriptor (would need to check) -- can take random key/value
attributes.  The serializing layer could write metadata describing libs
used, etc., serializing to the table descriptor and then into column
descriptor column family attributes describing column family specifics.

Of note, we can only add attributes at the column family level, not down at
the column level.

St.Ack
P.S. Tom, on your groovyhbase, maybe start up a little project up on google
code or github and add pointers up on the hbase wiki to the code -- since
its been through at least one revision and has had interest from others?



2009/4/17 Tom Nichols <[email protected]>

> I thought about protobufs  - That's probably the most straightforward
> since it is easily convertible to and from byte arrays.  I guess what
> I'm thinking about is a ''standard" serialization mechanism so I can
> put a pretty face on the HBase data access API without having to do
> the serialization and deserialization myself.  I I'm probably just
> being lazy :)

Reply via email to