[ 
https://issues.apache.org/jira/browse/HDFS-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647463#comment-16647463
 ] 

Vinayakumar B commented on HDFS-13156:
--------------------------------------

+1

> HDFS Block Placement Policy - Client Local Rack
> -----------------------------------------------
>
>                 Key: HDFS-13156
>                 URL: https://issues.apache.org/jira/browse/HDFS-13156
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 2.9.0, 3.2.0, 3.1.1
>            Reporter: BELUGA BEHR
>            Assignee: Ayush Saxena
>            Priority: Minor
>         Attachments: HDFS-13156-01.patch
>
>
> {quote}For the common case, when the replication factor is three, HDFS’s 
> placement policy is to put one replica on the local machine if the writer is 
> on a datanode, otherwise on a random datanode, another replica on a node in a 
> different (remote) rack, and the last on a different node in the same remote 
> rack.
> {quote}
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Replica_Placement:_The_First_Baby_Steps]
> Having just looked over the Default Block Placement code, the way I 
> understand this, is that, there are three basic scenarios:
>  # HDFS client is running on a datanode inside the cluster
>  # HDFS client is running on a node outside the cluster
>  # HDFS client is running on a non-datanode inside the cluster
> The documentation is ambiguous concerning the third scenario. Please correct 
> me if I'm wrong, but the way I understand the code, if there is an HDFS 
> client inside the cluster, but it is not on a datanode, the first block will 
> be placed on a datanode within the set of datanodes available on the local 
> rack and not simply on any _random datanode_ from the set of all datanodes in 
> the cluster.
> That is to say, if one rack has an HDFS Sink Flume Agent on a dedicated node, 
> I should expect that every first block will be written to a _random datanode_ 
> on the same rack as the HDFS Flume agent, assuming the network topology 
> script is written to include this Flume node.
> If that is correct, can the documentation be updated to include this third 
> common scenario?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to