Thanks Alex. From: Alex Loddengaard [mailto:a...@cloudera.com] Sent: Thursday, July 08, 2010 11:39 AM To: hdfs-user@hadoop.apache.org Subject: Re: rebalancing replciation help
Hi Arun, Consider setting dfs.balance.bandwidthPerSec to something as high as 20971520 for the balancer and the setrep. You can do this by supplying -D at the command line. Your strategy for getting data onto the 5 nodes is correct: balance and setrep. Just understand these things take time. Hope this helps. Alex On Wed, Jul 7, 2010 at 4:09 PM, Arun Ramakrishnan <aramakrish...@languageweaver.com<mailto:aramakrish...@languageweaver.com>> wrote: Hi guys. I have more than a specific question. I am going to layout the steps I have taken. Please comment on what I can do better. I was trying to to add 5 nodes to my existing 10 node cluster and also increase the replication factor from 2 to 3. I thought I don't have to run the balancer cause it would most likely put the new replicas into the new nodes. There are about 500k blocks. I wanted to get it all stabilized(replication and balancing) within 24 hours. Its more than 24 hours now and fsck reports 30% under replication. Is there a way to force hdfs to use balance/replicate more aggressively. It would be great if someone explained what/when things happen to blocks in the context of 1) Rebalancing 2) -setrep 3) Restarting cluster with a higher/lower replication factor. A few questions and a few issues here. 1) When you restart the cluster with a higher than previous replication value. Does it also apply to existing blocks or only to new blocks being created ? 2) Does the balancer take into account under replication of blocks or does it blindly start moving existing blocks to reach threshold ? A very specific problem . I am having this strange problem where the -setrep hangs on one particular block for hours. Is this because its corrupt ?. But, fsck said its healthy. Thanks Arun