Repository: hadoop
Updated Branches:
  refs/heads/branch-2 85363ea4b -> f7ee22505


HDFS-8942. Update hyperlink to rack awareness page in HDFS Architecture 
documentation. Contributed by Masatake Iwasaki.

(cherry picked from commit bcaf83902aa4d1e3e2cd26442df0a253eae7f633)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/f7ee2250
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/f7ee2250
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/f7ee2250

Branch: refs/heads/branch-2
Commit: f7ee22505216a8f4e800623162c9e4be1eb63e55
Parents: 85363ea
Author: Akira Ajisaka <aajis...@apache.org>
Authored: Mon Aug 24 13:52:49 2015 +0900
Committer: Akira Ajisaka <aajis...@apache.org>
Committed: Mon Aug 24 13:53:22 2015 +0900

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt                     | 3 +++
 hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/f7ee2250/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt 
b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index 3246919..241540f 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -854,6 +854,9 @@ Release 2.8.0 - UNRELEASED
 
     HDFS-8809. HDFS fsck reports under construction blocks as "CORRUPT". 
(jing9)
 
+    HDFS-8942. Update hyperlink to rack awareness page in HDFS Architecture
+    documentation. (Masatake Iwasaki via aajisaka)
+
 Release 2.7.2 - UNRELEASED
 
   INCOMPATIBLE CHANGES

http://git-wip-us.apache.org/repos/asf/hadoop/blob/f7ee2250/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
index d07630f..af26bac 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
@@ -116,7 +116,8 @@ The placement of replicas is critical to HDFS reliability 
and performance. Optim
 
 Large HDFS instances run on a cluster of computers that commonly spread across 
many racks. Communication between two nodes in different racks has to go 
through switches. In most cases, network bandwidth between machines in the same 
rack is greater than network bandwidth between machines in different racks.
 
-The NameNode determines the rack id each DataNode belongs to via the process 
outlined in [Hadoop Rack 
Awareness](../hadoop-common/ClusterSetup.html#HadoopRackAwareness). A simple 
but non-optimal policy is to place replicas on unique racks. This prevents 
losing data when an entire rack fails and allows use of bandwidth from multiple 
racks when reading data. This policy evenly distributes replicas in the cluster 
which makes it easy to balance load on component failure. However, this policy 
increases the cost of writes because a write needs to transfer blocks to 
multiple racks.
+The NameNode determines the rack id each DataNode belongs to via the process 
outlined in [Hadoop Rack Awareness](../hadoop-common/RackAwareness.html).
+A simple but non-optimal policy is to place replicas on unique racks. This 
prevents losing data when an entire rack fails and allows use of bandwidth from 
multiple racks when reading data. This policy evenly distributes replicas in 
the cluster which makes it easy to balance load on component failure. However, 
this policy increases the cost of writes because a write needs to transfer 
blocks to multiple racks.
 
 For the common case, when the replication factor is three, HDFS’s placement 
policy is to put one replica on one node in the local rack, another on a 
different node in the local rack, and the last on a different node in a 
different rack. This policy cuts the inter-rack write traffic which generally 
improves write performance. The chance of rack failure is far less than that of 
node failure; this policy does not impact data reliability and availability 
guarantees. However, it does reduce the aggregate network bandwidth used when 
reading data since a block is placed in only two unique racks rather than 
three. With this policy, the replicas of a file do not evenly distribute across 
the racks. One third of replicas are on one node, two thirds of replicas are on 
one rack, and the other third are evenly distributed across the remaining 
racks. This policy improves write performance without compromising data 
reliability or read performance.
 

Reply via email to