"If you're inserting into HDFS from a machine running a DataNode, the local datanode will always be chosen as one of the three replica targets." Does that mean that if replication factor is 1, whole file will be kept on one node only?
Thanks and regards. -Rajeev Gupta Aaron Kimball <aa...@cloudera.c om> To core-user@hadoop.apache.org 06/19/2009 01:56 cc AM Subject Re: HDFS is not loading evenly Please respond to across all nodes. core-u...@hadoop. apache.org Did you run the dfs put commands from the master node? If you're inserting into HDFS from a machine running a DataNode, the local datanode will always be chosen as one of the three replica targets. For more balanced loading, you should use an off-cluster machine as the point of origin. If you experience uneven block distribution, you should also periodically rebalance your cluster by running bin/start-balancer.sh every so often. It will work in the background to move blocks from heavily-laden nodes to underutilized ones. - Aaron On Thu, Jun 18, 2009 at 12:57 PM, openresearch < qiming...@openresearchinc.com> wrote: > > Hi all > > I "dfs put" a large dataset onto a 10-node cluster. > > When I observe the Hadoop progress (via web:50070) and each local file > system (via df -k), > I notice that my master node is hit 5-10 times harder than others, so hard > drive is get full quicker than others. Last night load, it actually crash > when hard drive was full. > > To my understand, data should wrap around all nodes evenly (in a > round-robin fashion using 64M as a unit). > > Is it expected behavior of Hadoop? Can anyone suggest a good > troubleshooting > way? > > Thanks > > > -- > View this message in context: > http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > >