Hi ChoaChun, Your explanation sounds right.
Thanks, dhruba -----Original Message----- From: Earney, Billy C. [mailto:[EMAIL PROTECTED] Sent: Monday, September 10, 2007 10:44 AM To: [email protected] Subject: RE: Replication problem of HDFS ChoaChun, I'm new to hadoop, but my understanding is that the data is divided into blocks, and that not all blocks need be on the same node. So if a file has 2 blocks the first block of a file could be on node 1 and the second block could be on node 2. From the link below, it seems that for each block, the client will contact the namenode and request one or more datanodes to store the block on. http://lucene.apache.org/hadoop/hdfs_design.html#Replication+Pipelining Is my understanding of the documentation correct? -----Original Message----- From: ChaoChun Liang [mailto:[EMAIL PROTECTED] Sent: Thursday, September 06, 2007 9:23 PM To: [email protected] Subject: RE: Replication problem of HDFS So, the upload process(from local file system to HDFS) will store all blocks(split from the dataset, said M split blocks) into a single node(depend on which client you put), not to all datanodes. And the "replication" means to replicate to N clients(if replication=N) and each client owns a completed/all M blocks. If I am wrong, please correct it. Thanks. ChaoChun Stu Hood-2 wrote: > > ChaoChun, > > Since you set the 'replication = 1' for the file, only 1 copy of the > file's blocks will be stored in Hadoop. If you want all 5 machines to have > copies of each block, then you would set 'replication = 5' for the file. > > The default for replication is 3. > > Thanks, > Stu > > -- View this message in context: http://www.nabble.com/Replication-problem-of-HDFS-tf4382878.html#a125348 39 Sent from the Hadoop Users mailing list archive at Nabble.com.
