HDFS architecture documentation describes outdated placement policy
-------------------------------------------------------------------

                 Key: HADOOP-5734
                 URL: https://issues.apache.org/jira/browse/HADOOP-5734
             Project: Hadoop Core
          Issue Type: Bug
          Components: documentation
    Affects Versions: 0.20.0
            Reporter: Konstantin Boudnik
            Priority: Minor


The "Replica Placement: The First Baby Steps" section of HDFS architecture 
document states:

"...
For the common case, when the replication factor is three, HDFS's placement 
policy is to put one replica on one node in the local rack, another on a 
different node in the local rack, and the last on a different node in a 
different rack. This policy cuts the inter-rack write traffic which generally 
improves write performance.
..."

However, according to the ReplicationTargetChooser.chooseTarger()'s code the 
actual logic is to put the second replica on a different rack as well as the 
third replica. So you have two replicas located on a different nodes of remote 
rack and one (initial replica) on the local rack's node. Thus, the sentence 
should say something like this:

"For the common case, when the replication factor is three, HDFS's placement 
policy is to put one replica on one node in the local rack, another on a node 
in a different (remote) rack, and the last on a different node in the same 
remote rack. This policy cuts the inter-rack write traffic which generally 
improves write performance."


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to