Hi ChoaChun,

Your explanation sounds right.

Thanks,
dhruba

-----Original Message-----
From: Earney, Billy C. [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 10, 2007 10:44 AM
To: [email protected]
Subject: RE: Replication problem of HDFS

ChoaChun,

I'm new to hadoop, but my understanding is that the data is divided into
blocks, and that not all blocks need be on the same node.  So if a file
has 2 blocks the first block of a file could be on node 1 and the second
block could be on node 2.  From the link below, it seems that for each
block, the client will contact the namenode and request one or more
datanodes to store the block on.

http://lucene.apache.org/hadoop/hdfs_design.html#Replication+Pipelining

Is my understanding of the documentation correct?


-----Original Message-----
From: ChaoChun Liang [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 06, 2007 9:23 PM
To: [email protected]
Subject: RE: Replication problem of HDFS


So, the upload process(from local file system to HDFS) will store all
blocks(split from the dataset, 
said M split blocks) into a single node(depend on which client you put),
not
to all datanodes. 
And the "replication" means to replicate to N clients(if replication=N)
and
each client owns
a completed/all M blocks. If I am wrong, please correct it. Thanks.

ChaoChun


Stu Hood-2 wrote:
> 
> ChaoChun,
> 
> Since you set the 'replication = 1' for the file, only 1 copy of the
> file's blocks will be stored in Hadoop. If you want all 5 machines to
have
> copies of each block, then you would set 'replication = 5' for the
file.
> 
> The default for replication is 3.
> 
> Thanks,
> Stu
> 
> 

-- 
View this message in context:
http://www.nabble.com/Replication-problem-of-HDFS-tf4382878.html#a125348
39
Sent from the Hadoop Users mailing list archive at Nabble.com.


Reply via email to