Michael, You would need ~40 nodes just to support 3X replication on HDFS. With about 250GB per node, you would have around 1000 regions per node.
With 7.5GB of memory on each node, if you can give 3-4GB to the RegionServer, you should be able to handle that number of regions and have sufficient memory for indexes and some caching. With 0.19.0 hadoop and hbase, you'll be hitting xceiver issues for sure, but this should be resolved for the 0.20 release, at which point I am confident we could handle that load. You'd also need sufficient memory in the NameNode, though 30TB is not too much. That doesn't address the performance you need in terms of reading, you would have to do your own benchmarks with your dataset and access pattern. You should be able to see how much concurrency you can pull out of an individual regionserver and extract that out to 40 nodes; read throughput scales (close enough to) linearly if your reads are well distributed across the entire dataset. Of course, if you have hot spots you will be limited to the performance of an individual server and will not benefit from a larger cluster. JG > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: Tuesday, February 03, 2009 8:41 AM > To: [email protected] > Subject: Hbase cluster configuration > > Hi, all > > Does anybody know a rule of thumb to calculate parameters of an Hbase > cluster. > to handle N read/write requests/sec (100K each) and manage M Tera bytes > of > data ? > > For instance, we ran a cluster of 4 hosts: each data node/region server > host has > 2 CPUs 2GHz each, 7.5G RAM, 850G disk. The performance is good enough > for > now, > but what we have to manage 10T with this cluster ? > > Thank you for your cooperation, > M.
