HDFS architecture documentation describes outdated placement policy -------------------------------------------------------------------
Key: HADOOP-5734 URL: https://issues.apache.org/jira/browse/HADOOP-5734 Project: Hadoop Core Issue Type: Bug Components: documentation Affects Versions: 0.20.0 Reporter: Konstantin Boudnik Priority: Minor The "Replica Placement: The First Baby Steps" section of HDFS architecture document states: "... For the common case, when the replication factor is three, HDFS's placement policy is to put one replica on one node in the local rack, another on a different node in the local rack, and the last on a different node in a different rack. This policy cuts the inter-rack write traffic which generally improves write performance. ..." However, according to the ReplicationTargetChooser.chooseTarger()'s code the actual logic is to put the second replica on a different rack as well as the third replica. So you have two replicas located on a different nodes of remote rack and one (initial replica) on the local rack's node. Thus, the sentence should say something like this: "For the common case, when the replication factor is three, HDFS's placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack. This policy cuts the inter-rack write traffic which generally improves write performance." -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.