JSON+
Question: Is it an acceptable design to use the timestamp as a data
element?
I am currently adding the date to the column name and setting the
number of versions in the table to 1.
Current: htable.put('table','family:date', 'JSON');
What I would like to do is use the timestamp as a data element to
store the date of the entry and set the number of versions to infinite.
Proposed: htable.put ('table', 'family:', 'JSON', 'date');
Is this a good approach? Are there any gotcha's? Is there a way to
get all of the versions for a row/column in a single call? I need to
graph the results over time.
On Dec 21, 2008, at 8:11 AM, Andrew Purtell wrote:
I use JSON for exactly this. A simple row/column/timestamp
key leads to a compound structure encoding all of the object
attributes, or maybe arrays of objects, etc. At the scale
where HBase is an effective solution you need to
denormalize ("insert time join") for query efficiency anyhow,
and I can serve the results out as is. Most of the work then
is done in the mapreduce tasks that produce and store the
JSON encodings in batch. I also build several views of the
data into multiple tables -- materialized views basically.
At Hadoop/HBase scale, disk space is cheap, seek time is not.
Because of this query processing time is low enough that I
can serve them right out of HBase without needing an
intermediate caching layer such as memcached or Tokyo
Cabinet (jgray's favorite).