RE: question on HDFS block distribution

Hairong Kuang Tue, 22 May 2007 17:58:24 -0700

This is done on purpose to improve the write performance. In practice, we
run map/reduce jobs on the cluster so every node in the cluster gets an
equal chance of writing. A single node data uploading as described in your
email is normally carried out at an off-cluster node. So imbalanced data
distribution should not be a problem.

Hairong

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 22, 2007 4:18 PM
To: [email protected]
Subject: question on HDFS block distribution

  hi guys, when a file being copied to HDFS, it seems that HDFS always
writes the first copy of a block to the data node running on the  machine
that invoked the copy, and the data nodes for the replicas are  selected
evenly from the remaining data nodes. so, for example, on a 5  node cluster
with replication factor set to 2, if i copy a N-byte file  from node 1, then
node 1 will use up N bytes and nodes 2,3,4,5 will use  up N/4 bytes each.
  is this a known issue, or there any way to configure HDFS so that the
blocks are distributed evenly (so with each node using up 2*N/5 bytes  in
this case)?
  thanks,

---------------------------------
Get the free Yahoo! toolbar and rest assured with the added security of
spyware protection.

RE: question on HDFS block distribution

Reply via email to