Yes you are absolutely correct. HBase must materialize the row for the data you retrieve. If that is one column family, or one column or a list of columns or the entire row. It just has to fit into memory. It requires a API change to fix, not sure if that is making into 0.21. But if you split up by column family as you indicated, HBase only retrieves the data necessary.
-ryan On Wed, Nov 11, 2009 at 10:25 PM, Greg Cottman <greg.cott...@quest.com> wrote: > Hi Ryan, > > If you only query columns from one column family though, won't HBase use data > locality to fetch only enough data to populate that column family? > > That way you can have rows with more columns in them, and still write > efficient queries that don't fetch all the irrelevant columns in a fat row. > > Cheers, > Greg. > > -----Original Message----- > From: Ryan Rawson [mailto:ryano...@gmail.com] > Sent: Thursday, 12 November 2009 5:18 PM > To: hbase-user@hadoop.apache.org > Subject: Re: newbie question: what is better? one with a lot of keys OR a lot > of tables with fewer keys? > > Either is fine. When you read an entire row from hbase, it must > materialize the entire row in ram. Thus your table width is limited if > you wish to read the entire row at a time. > > On Wed, Nov 11, 2009 at 9:45 PM, Jeff Zhang <zjf...@gmail.com> wrote: >> Continue this question, >> >> which is better for hbase, more rows with fewer columns or fewer rows with >> more columns >> >> >> Jeff Zhang >> >> >> On Thu, Nov 12, 2009 at 5:17 AM, TuxRacer69 <tuxrace...@gmail.com> wrote: >> >>> Thank you Jean-Daniel >>> >>> >>> Jean-Daniel Cryans wrote: >>> >>>> Alex, >>>> >>>> In HBase it really makes more sense to put all the data you can in a >>>> single table as it will be automatically partitioned and distributed >>>> across the region servers (providing you have more than 256MB of >>>> data). >>>> >>>> J-D >>>> >>>> >>> >>> >> >