Hello, Stu: >> If "replication == 2" then it will make sure that 2 copies of each of the >> M blocks exist on datanodes.
>>No, it means to replicate the file to N datanodes. The client is only used to transfer files to/from >>Hadoop: it doesn't do any long term storage. Sorry, the client what I means is the datanodes. In my application, whether M blocks(described as above) exist in the name datanode(i.e. each database owns a completed M block), or shared M blocks for datanodes in the HDFS is important for us. If these M blocks could be shared, we may use the HDFS, otherswise we may condiser the local file system for the map/reduce processing. ChaoChun -----Original Message----- From: ChaoChun Liang Sent: Thursday, September 6, 2007 10:23pm To: [email protected] Subject: RE: Replication problem of HDFS So, the upload process(from local file system to HDFS) will store all blocks(split from the dataset, said M split blocks) into a single node(depend on which client you put), not to all datanodes. And the "replication" means to replicate to N clients(if replication=N) and each client owns a completed/all M blocks. If I am wrong, please correct it. Thanks. ChaoChun Stu Hood-2 wrote: > > ChaoChun, > > Since you set the 'replication = 1' for the file, only 1 copy of the > file's blocks will be stored in Hadoop. If you want all 5 machines to have > copies of each block, then you would set 'replication = 5' for the file. > > The default for replication is 3. > > Thanks, > Stu > > -- View this message in context: http://www.nabble.com/Replication-problem-of-HDFS-tf4382878.html#a12534839 Sent from the Hadoop Users mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Replication-problem-of-HDFS-tf4382878.html#a12607008 Sent from the Hadoop Users mailing list archive at Nabble.com.
