Re:  " keyed by something like [timestamp, action details, session ID]"

Read the part about monotonically increasing keys in the HBase book.  There 
have been lots of other threads in the dist-list about this topic too.


-----Original Message-----
From: Leif Wickland [mailto:leifwickl...@gmail.com] 
Sent: Monday, June 13, 2011 1:29 PM
To: user@hbase.apache.org
Subject: Re: Question from HBase book: "HBase currently does not do well with 
anything about two or three column families"

>
> If they have divergent read and write patterns why not put them in 
> separate tables?
>

That's an entirely fair question.  I'm new to this.  I figured if the data was 
related to the same thing and could have the same key, then it ought to go into 
various CFs on that key in a single table.  I got the feeling from reading the 
BigTable paper that the typical design approach was to dump lots of CFs into a 
table.  It seems like that's not the HBase-way, though.

For the most part it's not a big deal to store the data in separate tables.
 However, I'm curious what you'd recommend for one particular part of it.
 Specifically I'd like to store actions within a web visit.  I've been planning 
to store individual actions as columns in their own column family, keyed by 
something like [timestamp, action details, session ID].  In another column 
family I'd been planning on storing statistics about the actions, such as first 
time, end time, count, etc.  When writing to the actions CF, I'd need to read 
from and possibly update the stats CF.  Would your recommendation be to store 
that kind of data in the same CF, two CFs in the same table, or in two separate 
tables?

My thought was that I could use row locking to avoid races to update the stats 
after inserting into actions if I took the two CF approach.

Thanks for your feedback,

Leif Wickland

Reply via email to