JSON+
Question: Is it an acceptable design to use the timestamp as a data element?

I am currently adding the date to the column name and setting the number of versions in the table to 1.

Current:  htable.put('table','family:date', 'JSON');

What I would like to do is use the timestamp as a data element to store the date of the entry and set the number of versions to infinite.

Proposed: htable.put ('table', 'family:', 'JSON', 'date');

Is this a good approach? Are there any gotcha's? Is there a way to get all of the versions for a row/column in a single call? I need to graph the results over time.

On Dec 21, 2008, at 8:11 AM, Andrew Purtell wrote:

I use JSON for exactly this. A simple row/column/timestamp
key leads to a compound structure encoding all of the object
attributes, or maybe arrays of objects, etc. At the scale
where HBase is an effective solution you need to
denormalize ("insert time join") for query efficiency anyhow,
and I can serve the results out as is. Most of the work then
is done in the mapreduce tasks that produce and store the
JSON encodings in batch. I also build several views of the
data into multiple tables -- materialized views basically.
At Hadoop/HBase scale, disk space is cheap, seek time is not.

Because of this query processing time is low enough that I
can serve them right out of HBase without needing an
intermediate caching layer such as memcached or Tokyo
Cabinet (jgray's favorite).




Reply via email to