By setting "dfs.balance.bandwidthPerSec" to 1GB/sec, each datanode is able to utilize up to 1GB/sec for block balancing. It seems to be too high as even a gigabit ethernet can't handle that much data per sec.
When you get timeouts, it probably means your network is saturated. Maybe you were running a big map reduce job which required lots of data transfer among nodes by then? Try setting it to be 10~30MB/sec and see what happens. On Sat, Jul 19, 2008 at 1:56 AM, David J. O'Dell <[EMAIL PROTECTED]> wrote: > I'm trying to re balance my cluster as I've added to more nodes. > When I run balancer with the default threshold I am seeing timeouts in > the logs: > > 2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Decided to > move block -8432927406854991437 with a length of 128 MB bytes from > 10.11.6.234:50010 to 10.11.6.235:50010 using proxy source > 10.11.6.234:50010 > 2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Starting > Block mover for -8432927406854991437 from 10.11.6.234:50010 to > 10.11.6.235:50010 > 2008-07-18 09:52:46,826 WARN org.apache.hadoop.dfs.Balancer: Timeout > moving block -8432927406854991437 from 10.11.6.234:50010 to > 10.11.6.235:50010 through 10.11.6.234:50010 > > I read in the balancer guide-> > http://issues.apache.org/jira/secure/attachment/12370966/BalancerUserGuide2 > That the default transfer rate is 1mb/sec > I tried increasing this to 1gb/sec but I'm still seeing the timeouts. > All of the nodes have gigE nics and are on the same switch. > > > -- > David O'Dell > Director, Operations > e: [EMAIL PROTECTED] > t: (415) 738-5152 > 180 Townsend St., Third Floor > San Francisco, CA 94107 > >
