IMO, typing or structuring the cell content belongs in the layer above hbase.
As I see it, we could chose an explicit representation but then applications that wanted to use an alternate would be paying a tax for the serialization dissonance. Those who wanted to persist binary would see their data bloat; e.g. reformat as json with base64'ing to be safe. I don't think it would be too hard adding a generalized system atop hbase that would allow use of protobuf, json, thrift or avro, etc. The tools are there to do such a thing, I believe. The column descriptor -- and maybe even the table descriptor (would need to check) -- can take random key/value attributes. The serializing layer could write metadata describing libs used, etc., serializing to the table descriptor and then into column descriptor column family attributes describing column family specifics. Of note, we can only add attributes at the column family level, not down at the column level. St.Ack P.S. Tom, on your groovyhbase, maybe start up a little project up on google code or github and add pointers up on the hbase wiki to the code -- since its been through at least one revision and has had interest from others? 2009/4/17 Tom Nichols <[email protected]> > I thought about protobufs - That's probably the most straightforward > since it is easily convertible to and from byte arrays. I guess what > I'm thinking about is a ''standard" serialization mechanism so I can > put a pretty face on the HBase data access API without having to do > the serialization and deserialization myself. I I'm probably just > being lazy :)
