According the HDFS reference, it sounds right.
ChaoChun Earney, Billy C. wrote: > > ChoaChun, > > I'm new to hadoop, but my understanding is that the data is divided into > blocks, and that not all blocks need be on the same node. So if a file > has 2 blocks the first block of a file could be on node 1 and the second > block could be on node 2. From the link below, it seems that for each > block, the client will contact the namenode and request one or more > datanodes to store the block on. > > http://lucene.apache.org/hadoop/hdfs_design.html#Replication+Pipelining > > Is my understanding of the documentation correct? > > > -----Original Message----- > From: ChaoChun Liang [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 06, 2007 9:23 PM > To: [email protected] > Subject: RE: Replication problem of HDFS > > > So, the upload process(from local file system to HDFS) will store all > blocks(split from the dataset, > said M split blocks) into a single node(depend on which client you put), > not > to all datanodes. > And the "replication" means to replicate to N clients(if replication=N) > and > each client owns > a completed/all M blocks. If I am wrong, please correct it. Thanks. > > ChaoChun > > > Stu Hood-2 wrote: >> >> ChaoChun, >> >> Since you set the 'replication = 1' for the file, only 1 copy of the >> file's blocks will be stored in Hadoop. If you want all 5 machines to > have >> copies of each block, then you would set 'replication = 5' for the > file. >> >> The default for replication is 3. >> >> Thanks, >> Stu >> >> > > -- > View this message in context: > http://www.nabble.com/Replication-problem-of-HDFS-tf4382878.html#a125348 > 39 > Sent from the Hadoop Users mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Replication-problem-of-HDFS-tf4382878.html#a12607024 Sent from the Hadoop Users mailing list archive at Nabble.com.
