[ https://issues.apache.org/jira/browse/HDFS-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647463#comment-16647463 ]
Vinayakumar B commented on HDFS-13156: -------------------------------------- +1 > HDFS Block Placement Policy - Client Local Rack > ----------------------------------------------- > > Key: HDFS-13156 > URL: https://issues.apache.org/jira/browse/HDFS-13156 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation > Affects Versions: 2.9.0, 3.2.0, 3.1.1 > Reporter: BELUGA BEHR > Assignee: Ayush Saxena > Priority: Minor > Attachments: HDFS-13156-01.patch > > > {quote}For the common case, when the replication factor is three, HDFS’s > placement policy is to put one replica on the local machine if the writer is > on a datanode, otherwise on a random datanode, another replica on a node in a > different (remote) rack, and the last on a different node in the same remote > rack. > {quote} > [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Replica_Placement:_The_First_Baby_Steps] > Having just looked over the Default Block Placement code, the way I > understand this, is that, there are three basic scenarios: > # HDFS client is running on a datanode inside the cluster > # HDFS client is running on a node outside the cluster > # HDFS client is running on a non-datanode inside the cluster > The documentation is ambiguous concerning the third scenario. Please correct > me if I'm wrong, but the way I understand the code, if there is an HDFS > client inside the cluster, but it is not on a datanode, the first block will > be placed on a datanode within the set of datanodes available on the local > rack and not simply on any _random datanode_ from the set of all datanodes in > the cluster. > That is to say, if one rack has an HDFS Sink Flume Agent on a dedicated node, > I should expect that every first block will be written to a _random datanode_ > on the same rack as the HDFS Flume agent, assuming the network topology > script is written to include this Flume node. > If that is correct, can the documentation be updated to include this third > common scenario? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org