[ 
https://issues.apache.org/jira/browse/HDDS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974047#comment-16974047
 ] 

Sammi Chen commented on HDDS-2249:
----------------------------------

Thanks [~swagle] for report this.  One idea comes to my mind is how about use 
the the hostname:port as the key in dnsToUuidMap.  If it works, will it solve 
this issue? 

> SortDatanodes does not return correct orders when many DNs on a given host
> --------------------------------------------------------------------------
>
>                 Key: HDDS-2249
>                 URL: https://issues.apache.org/jira/browse/HDDS-2249
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Priority: Major
>
> In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list 
> of nodes rather than a single entry, to handle the case where many datanodes 
> are running on the same host.
> In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from 
> getNodesByAddress to determine if the client submitting the request is 
> running on a cluster node, and if it is, it attempts to sort the datanodes by 
> distance from the client machine.
> To do this, the code currently takes the first DatanodeDetails object 
> returned by getHostsByAddress and then compares it with the other passed in 
> nodes. If any of the passed nodes are equal to the client node (based on the 
> Java object ID) it returns a zero distance, otherwise the distance is 
> calculated.
> The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later 
> calls NetworkTopologyImpl.getDistanceCost() which is where the object 
> comparison is performed:
> {code}
> if ((node1 != null && node2 != null && node1.equals(node2)) ||
>  (node1 == null && node2 == null)) {
>  return 0;
> }
> {code}
> This does not always work when there are many datanodes on the same host, as 
> the first node returned from getNodesByAddress() is guarantted to be on the 
> same host as the client, but the list of passed datanodes may not include 
> that datanode instance.
> To fix this, we should probably have getDistanceCost() compare hostnames or 
> IP as a second check or instead of the object equality, however this is not 
> trivial to implement.
> The reason, is that getDistanceCost() takes Node objects (not 
> DatanodeDetails) and a Node does not have a IP or Hostname field. It does 
> have a getNetworkName method, which should return the hostname, but it is 
> overwritten by the hosts UUID when it registed to the node manager, by this 
> line in NodeManager.register():
> datanodeDetails.setNetworkName(datanodeDetails.getUuidString());
>  
> Note this only affects test clusters where many DNs are on a single host, and 
> it does not cause any failures. The DNs may be returned a less than ideal 
> order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to