For the record, the refGuide mentions potential issues of CF lumpiness that you mentioned:
http://hbase.apache.org/book.html#number.of.cfs 6.2.1. Cardinality of ColumnFamilies Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows). If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA's data will likely be spread across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient. Š. anything that needs to be updated/added for this? On 4/8/13 12:39 AM, "lars hofhansl" <la...@apache.org> wrote: >I think the main problem is that all CFs have to be flushed if one gets >large enough to require a flush. >(Does anyone remember why exactly that is? And do we still need that now >that the memstoreTS is stored in the HFiles?) > > >So things are fine as long as all CFs have roughly the same size. But if >you have one that gets a lot of data and many others that are smaller, >we'd end up with a lot of unnecessary and small store files from the >smaller CFs. > >Anything else known that is bad about many column families? > > >-- Lars > > > >________________________________ > From: Andrew Purtell <apurt...@apache.org> >To: "user@hbase.apache.org" <user@hbase.apache.org> >Sent: Sunday, April 7, 2013 3:52 PM >Subject: Re: schema design: rows vs wide columns > >Is there a pointer to evidence/experiment backed analysis of this >question? >I'm sure there is some basis for this text in the book but I recommend we >strike it. We could replace it with YCSB or LoadTestTool driven latency >graphs for different workloads maybe. Although that would also be a big >simplification of 'schema design' considerations, it would not be so >starkly lacking background. > >On Sunday, April 7, 2013, Ted Yu wrote: > >> From http://hbase.apache.org/book.html#number.of.cfs : >> >> HBase currently does not do well with anything above two or three column >> families so keep the number of column families in your schema low. >> >> Cheers >> >> On Sun, Apr 7, 2013 at 3:04 PM, Stack <st...@duboce.net <javascript:;>> >> wrote: >> >> > On Sun, Apr 7, 2013 at 11:58 AM, Ted <yuzhih...@gmail.com >><javascript:;>> >> wrote: >> > >> > > With regard to number of column families, 3 is the recommended >>maximum. >> > > >> > >> > How did you come up w/ the number '3'? Is it a 'hard' 3? Or does it >> > depend? If the latter, on what does it depend? >> > Thanks, >> > St.Ack >> > >> > > >-- >Best regards, > > - Andy > >Problems worthy of attack prove their worth by hitting back. - Piet Hein >(via Tom White)