I have found with my tests that 3 nodes is wholy insufficient.  I think it's
causing me to hit the xciever limit sooner than I would if I was running 10+
machines.  The issue is with r=3 on HDFS, and you have 3 machines, you get
reliability but no spreading of load.  I don't know how big the 'large EC2'
instances are, but you might want to consider running more of smaller for
the same cost if possible.  You get better spread of load across machines,
and should increase overall performance.

Also, how is it running on EC2?  What happens when your machines go away?
You have to rewrite and copy the config around, do you not?

One last thing, the master is very important, but also takes the least
load.  Running bigger iron for it seems pointless to me.  My master has a
load average of 0.00 at all times, including when I am running intense
import MR tasks that put a LA of 6+ on all my region server/datanode
servers.

-ryan

On Thu, Jan 15, 2009 at 3:05 AM, Michael Dagaev <[email protected]>wrote:

> Hi, all
>
>    How did you plan your Hbase cluster capacity ?
>
> Currently we run a cluster of 4 large EC2 instances
> (one master and 3 region servers). The throughput
> is Ok but the database is small now.
>
> Let's say we are preparing to store X terabytes.
> I guess the database size will impacts the performance.
> How many servers should we run in that case ?
>
> Thank you for your cooperation,
> M.
>

Reply via email to