[ 
https://issues.apache.org/jira/browse/HDFS-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900592#comment-13900592
 ] 

Jing Zhao commented on HDFS-5946:
---------------------------------

I guess the DataNode list here is sorted? Thus all the decommissioned DNs are 
actually in the end of the list. So the logic that throws an exception when the 
first DN is decommissioned should be correct?

The original code before HDFS-5891 does not do random pick for all the cases. 
The random pick is only for web UI. Thus I guess we do not need to worry about 
the scenario where too much traffic is to be directed to the same DN. WebHdfs 
etc. always tries to use the first DN, and this is consistent with the sorted 
DN list logic.

> Webhdfs DN choosing code is flawed
> ----------------------------------
>
>                 Key: HDFS-5946
>                 URL: https://issues.apache.org/jira/browse/HDFS-5946
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, webhdfs
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> HDFS-5891 improved the performance of redirecting webhdfs clients to a DN.  
> Instead of attempting a connection with a 1-minute timeout, the NN skips 
> decommissioned nodes.
> The logic appears flawed.  It finds the index of the first decommissioned 
> node, if any, then:
> * Throws an exception if index = 0, even if other nodes later in the list are 
> not decommissioned.
> * Else picks a random node prior to the index.  Let's say there are 10 
> replicas, 2nd location is decommissioned.  All clients will be redirected to 
> the first location even though there are 8 other valid locations.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to