Hi all, I am reading the hadoop source code to study the design of the hadoop distributed filesystem. And I think I've got some questions about the file replication of HDFS.
I know the degree of replication of HDFS is configurable on a configure file such as "hadoop-default.xml". The default degree is 3. And hadoop use *ReplicationTargetChooser* to try to choose the best datanodes to store the replications. But when the degree of replication for a file drops below the configured amount(such as, due to an extended datanode outage). I guess that there will be an daemon thread or other things run backgroud to do the *Re-replication* : the namenode forces the block to be re-replicated on the remaining datanodes. but I can find any source code to do this in the hadoop source archive? Can anybody tell me how hadoop deal with Re-replication? In addition, I am confused about how hadoop to keep the replications consistency. I know hadoop use pipeline-write to try to make the replications consistency. but datanode crash may make the replications become inconsistency. For example, we got a block of 3 replications in 3 datanodes: datanode1, datanode2, datanode3. Datanode3 crashes in some time, and datanode3 become inavailable. Before the recovery of datanode3, if we want to write the block mentioned above, how hadoop will do? to write only the replications in datanode1, datanode2? or to wait for the recovery of datanode3? To wait will be unadvisable. It cost too much. if we have written the replications in datanode1,2, we will got inconsistency between datanode3 and datanode1,2. Does hadoop deal with this kind of inconsistency? And If so, where I can find this part of source code in hadoop source archive? Hope for your reply. regards, Samuel
