RE: Hbase cluster configuration

Jonathan Gray Tue, 03 Feb 2009 09:24:12 -0800

Michael,

You would need ~40 nodes just to support 3X replication on HDFS.  With about
250GB per node, you would have around 1000 regions per node.


With 7.5GB of memory on each node, if you can give 3-4GB to the
RegionServer, you should be able to handle that number of regions and have
sufficient memory for indexes and some caching.  With 0.19.0 hadoop and
hbase, you'll be hitting xceiver issues for sure, but this should be
resolved for the 0.20 release, at which point I am confident we could handle
that load.

You'd also need sufficient memory in the NameNode, though 30TB is not too
much.

That doesn't address the performance you need in terms of reading, you would
have to do your own benchmarks with your dataset and access pattern.  You
should be able to see how much concurrency you can pull out of an individual
regionserver and extract that out to 40 nodes; read throughput scales (close
enough to) linearly if your reads are well distributed across the entire
dataset.  Of course, if you have hot spots you will be limited to the
performance of an individual server and will not benefit from a larger
cluster.

JG

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Tuesday, February 03, 2009 8:41 AM
> To: [email protected]
> Subject: Hbase cluster configuration
> 
> Hi, all
> 
> Does anybody know a rule of thumb to calculate parameters of an Hbase
> cluster.
> to handle N read/write requests/sec (100K each) and manage M Tera bytes
> of
> data ?
> 
> For instance, we ran a cluster of 4 hosts: each data node/region server
> host has
> 2 CPUs 2GHz each, 7.5G RAM, 850G disk. The performance is good enough
> for
> now,
> but what we have to manage 10T with this cluster ?
> 
> Thank you for your cooperation,
> M.

RE: Hbase cluster configuration

Reply via email to