Re: question about parallel get()

stack Sun, 17 May 2009 12:58:29 -0700

On Sun, May 17, 2009 at 11:19 AM, Yair Even-Zohar <[email protected]
> wrote:


> I'd like to run an efficient table get() methods and retrieve about a
> 1000 rows where each row includes about 4 columns (around 20 bytes per
> cell) with several versions per column. I assume the longest wait is for
> reading the row from the disk so I could parallelize these reads. Any
> suggestions what would be the best method?
>
>

0.19.x hbase or TRUNK?



>
>
> 1)       How many gets() should I be running in parallel?
>


Depends on how many disks and distribution of gets over nodes in the
cluster.



>
> 2)       What's the best number of get() per region?
>


How many column families?  All in one column family?



>
> 3)       Should the row ids be randomized among the different regions?
>
>
Its best, yes, to distribute your get load over the cluster if you can.

Sorry for all the 'depends' and answering-questions with questions.  Its my
culture (smile).

St.Ack

Re: question about parallel get()

Reply via email to