"If you're inserting
into HDFS from a machine running a DataNode, the local datanode will always
be chosen as one of the three replica targets."
Does that mean that if replication factor is 1, whole file will be kept on
one node only?

Thanks and regards.
-Rajeev Gupta



                                                                           
             Aaron Kimball                                                 
             <aa...@cloudera.c                                             
             om>                                                        To 
                                       core-user@hadoop.apache.org         
             06/19/2009 01:56                                           cc 
             AM                                                            
                                                                   Subject 
                                       Re: HDFS is not loading evenly      
             Please respond to         across all nodes.                   
             core-u...@hadoop.                                             
                apache.org                                                 
                                                                           
                                                                           
                                                                           
                                                                           




Did you run the dfs put commands from the master node?  If you're inserting
into HDFS from a machine running a DataNode, the local datanode will always
be chosen as one of the three replica targets. For more balanced loading,
you should use an off-cluster machine as the point of origin.

If you experience uneven block distribution, you should also periodically
rebalance your cluster by running bin/start-balancer.sh every so often. It
will work in the background to move blocks from heavily-laden nodes to
underutilized ones.

- Aaron

On Thu, Jun 18, 2009 at 12:57 PM, openresearch <
qiming...@openresearchinc.com> wrote:

>
> Hi all
>
> I "dfs put" a large dataset onto a 10-node cluster.
>
> When I observe the Hadoop progress (via web:50070) and each local file
> system (via df -k),
> I notice that my master node is hit 5-10 times harder than others, so
hard
> drive is get full quicker than others. Last night load, it actually crash
> when hard drive was full.
>
> To my understand,  data should wrap around all nodes evenly (in a
> round-robin fashion using 64M as a unit).
>
> Is it expected behavior of Hadoop? Can anyone suggest a good
> troubleshooting
> way?
>
> Thanks
>
>
> --
> View this message in context:
>
http://www.nabble.com/HDFS-is-not-loading-evenly-across-all-nodes.-tp24099585p24099585.html

> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Reply via email to