On Sun, May 17, 2009 at 11:19 AM, Yair Even-Zohar <[email protected] > wrote:
> I'd like to run an efficient table get() methods and retrieve about a > 1000 rows where each row includes about 4 columns (around 20 bytes per > cell) with several versions per column. I assume the longest wait is for > reading the row from the disk so I could parallelize these reads. Any > suggestions what would be the best method? > > 0.19.x hbase or TRUNK? > > > 1) How many gets() should I be running in parallel? > Depends on how many disks and distribution of gets over nodes in the cluster. > > 2) What's the best number of get() per region? > How many column families? All in one column family? > > 3) Should the row ids be randomized among the different regions? > > Its best, yes, to distribute your get load over the cluster if you can. Sorry for all the 'depends' and answering-questions with questions. Its my culture (smile). St.Ack
