Re:  " Is that still considered current?  Do folks on the list generally agree 
with that guideline?"

Yes and yes.  HBase runs better with fewer CFs.



-----Original Message-----
From: Leif Wickland [mailto:leifwickl...@gmail.com] 
Sent: Thursday, June 02, 2011 5:41 PM
To: user@hbase.apache.org
Subject: Question from HBase book: "HBase currently does not do well with 
anything about two or three column families"

I was reading through the HBase book and came across the following in *6.2. On 
the number of column families.<http://hbase.apache.org/book.html#number.of.cfs>
*
*
*

*"HBase currently does not do well with anything about two or three column 
families so keep the number of column families in your schema low.
Currently, flushing and compactions are done on a per Region basis so if one 
column family is carrying the bulk of the data bringing on flushes, the 
adjacent families will also be flushed though the amount of data they carry is 
small. Compaction is currently triggered by the total number of files under a 
column family. Its not size based. When many column families the flushing and 
compaction interaction can make for a bunch of needless i/o loading (To be 
addressed by changing flushing and compaction to work on a per column family 
basis).*

*Try to make do with one column famliy if you can in your schemas. Only 
introduce a second and third column family in the case where data access is 
usually column scoped; i.e. you query one column family or the other but 
usually not both at the one time."*

Is that still considered current?  Do folks on the list generally agree with 
that guideline?

The reason that I ask is that I'm designing a data model which currently has 
five column families.  I expect each of those column families to have divergent 
read and write patterns.  Do you think I should look for ways to reduce the 
number of CFs?

Thanks,

Leif Wickland

Reply via email to