Hi all, I performed a cluster rebalance on my test cluster yesterday (5 regionserver / datanodes each with approx 400GB - total approx 2TB HDFS) and I would like to know if the mailing lists have seen similar results to what I've seen.
I had a single table with a single column family and loaded it up so that it just about filled the entire cluster. Actually one or two of the nodes had run out of space, yet the fifth machine only had 50% of its disks utilised (which is why I though a rebalance was in order). There are a total of 1475 regions in the cluster. Prior to starting the rebalance the cluster only had about 250GB left to it's disposal. After the rebalance I now have almost 800GB free. Furthermore, I was performing read tests prior to the rebalance and getting a response time of approx 500ms per row (each row has 10000 column instances of the column family which were deserialised as part of the test). After the rebalance my read times reduced to around 340ms. Has anybody experienced something like this, or can anyone explain why I would see such a benefit? Does anybody regularly run a cluster rebalance on the hadoop cluster running hbase? Thanks, Daniel
