RE: question about parallel get()

Yair Even-Zohar Sun, 17 May 2009 22:52:35 -0700

1) EC2, medium server 
2) 3 or 4 column families. From thousands to millions of columns



-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
stack
Sent: Sunday, May 17, 2009 10:58 PM
To: [email protected]
Subject: Re: question about parallel get()

On Sun, May 17, 2009 at 11:19 AM, Yair Even-Zohar
<[email protected]
> wrote:

> I'd like to run an efficient table get() methods and retrieve about a
> 1000 rows where each row includes about 4 columns (around 20 bytes per
> cell) with several versions per column. I assume the longest wait is
for
> reading the row from the disk so I could parallelize these reads. Any
> suggestions what would be the best method?
>
>

0.19.x hbase or TRUNK?



>
>
> 1)       How many gets() should I be running in parallel?
>


Depends on how many disks and distribution of gets over nodes in the
cluster.



>
> 2)       What's the best number of get() per region?
>


How many column families?  All in one column family?



>
> 3)       Should the row ids be randomized among the different regions?
>
>
Its best, yes, to distribute your get load over the cluster if you can.

Sorry for all the 'depends' and answering-questions with questions.  Its
my
culture (smile).

St.Ack

RE: question about parallel get()

Reply via email to