>1) I am not sure that whether I should start the rebalance on the namenode or >on each new datanode. You can run the balancer in any node. It is not suggested to run in namenode and would be better to run in a node which has less load.
>2) should I set the bandwidth on each datanode or just only on the namenode Each data node has a limited bandwidth for rebalancing. The default value for the bandwidth is 5MB/s. >3) If the rebalance started, whether the data on others' would be decreased Yes, after the balancer run, data will be moved from over utilized nodes to under utilized nodes. >4)whether the log details means the balancer was killed by another one. We cannot run multiple balancers at a time. It is allowed to run only one balancer at any time in the cluster to avoid data corruption. You can refer the below document fot more details. https://issues.apache.org/jira/secure/attachment/12368261/RebalanceDesign6.pdf Thanks Devaraj ________________________________________ From: yingnan.ma [yingnan...@ipinyou.com] Sent: Wednesday, May 30, 2012 7:06 AM To: common-user Subject: about rebalance Hi, I add 5 new datanode and I want to do the rebalance, and I started the rebalance on the namenode, and it gave me the notice that "starting balancer, logging to /hadoop/logs/hadoop-hdfs-balancer-hadoop220.out " and today I check the log file and the detail is that " Another balancer is running. Exiting... Balancing took 5.0203 minutes " 1) I am not sure that whether I should start the rebalance on the namenode or on each new datanode. 2) should I set the bandwidth on each datanode or just only on the namenode 3) If the rebalance started, whether the data on others' would be decreased 4)whether the log details means the balancer was killed by another one. If you have some suggestion, please give me some notice , thank you Best Regards Malone 2012-05-30 Yingnan.Ma E yingnan...@ipinyou.com MSN: mayingnan_b...@hotmail.com QQ: 230624226 北京市朝阳区八里庄西里100号东区 住邦2000,1号楼A座2101室,100025 北京・上海・硅谷 http://www.ipinyou.com