[ 
https://issues.apache.org/jira/browse/HDFS-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068943#comment-14068943
 ] 

Andrew Wang commented on HDFS-6701:
-----------------------------------

Hi Ashwin,

Just a nitty thing, otherwise +1:

{code}
+<property>
+  <name>dfs.namenode.randomize-block-locations-per-block</name>
+  <value>false</value>
+  <description>When there is no node local block, the default behavior
+    while getting block locations is that - block locations of a block 
+    are not randomized,so requests for a block go to same replica to take 
+    advantage of page cache effects. 
+    However, in some network topologies,hitting the same replica may cause 
+    issues like container taking a long time to download from hdfs and 
eventually
+    failing. In these cases, we could make this property "true" and randomize 
+    block locations of a block, which in turn would load balance requests
+    among replicas.
+  </description>
+</property>
{code}

* "that - block locations" remove dash
* "randomized,so" needs space
* "topologies,hitting" needs space
* "hdfs" should be HDFS
* Since this is XML, quotes need to be escaped. Or you can just remove them. 
Line breaks are also not going to show up.

Recommend something like the following (feel free to copy paste):

When fetching replica locations of a block, the replicas are sorted based on 
network distance. This configuration parameter determines whether the replicas 
at the same network distance are randomly shuffled. By default, this is false, 
such that repeated requests for a block's replicas always result in the same 
order. This potentially improves page cache behavior. However, for some network 
topologies, it is desirable to shuffle this order for better load balancing.

> Make seed optional in NetworkTopology#sortByDistance
> ----------------------------------------------------
>
>                 Key: HDFS-6701
>                 URL: https://issues.apache.org/jira/browse/HDFS-6701
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.5.0
>            Reporter: Ashwin Shankar
>            Assignee: Ashwin Shankar
>         Attachments: HDFS-6701-v1.txt, HDFS-6701-v3-branch2.txt, 
> HDFS-6701-v3.txt
>
>
> Currently seed in NetworkTopology#sortByDistance is set to the blockid which 
> causes the RNG to generate same pseudo random order for each block. If no 
> node local block location is present,this causes the same rack local replica 
> to be hit for a particular block.
> It'll be good to make the seed optional, so that one could turn it off if 
> they want block locations of a block to be randomized.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to