Stephen O'Donnell created HDDS-2249:
---------------------------------------

             Summary: SortDatanodes does not return correct orders when many 
DNs on a given host
                 Key: HDDS-2249
                 URL: https://issues.apache.org/jira/browse/HDDS-2249
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: SCM
    Affects Versions: 0.5.0
            Reporter: Stephen O'Donnell


In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list of 
nodes rather than a single entry, to handle the case where many datanodes are 
running on the same host.

In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from 
getNodesByAddress to determine if the client submitting the request is running 
on a cluster node, and if it is, it attempts to sort the datanodes by distance 
from the client machine.

To do this, the code currently takes the first DatanodeDetails object returned 
by getHostsByAddress and then compares it with the other passed in nodes. If 
any of the passed nodes are equal to the client node (based on the Java object 
ID) it returns a zero distance, otherwise the distance is calculated.

The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later 
calls NetworkTopologyImpl.getDistanceCost() which is where the object 
comparison is performed:

{code}
if ((node1 != null && node2 != null && node1.equals(node2)) ||
 (node1 == null && node2 == null)) {
 return 0;
}
{code}

This does not always work when there are many datanodes on the same host, as 
the first node returned from getNodesByAddress() is guarantted to be on the 
same host as the client, but the list of passed datanodes may not include that 
datanode instance.

To fix this, we should probably have getDistanceCost() compare hostnames or IP 
as a second check or instead of the object equality, however this is not 
trivial to implement.

The reason, is that getDistanceCost() takes Node objects (not DatanodeDetails) 
and a Node does not have a IP or Hostname field. It does have a getNetworkName 
method, which should return the hostname, but it is overwritten by the hosts 
UUID when it registed to the node manager, by this line in 
NodeManager.register():

datanodeDetails.setNetworkName(datanodeDetails.getUuidString());

 

Note this only affects test clusters where many DNs are on a single host, and 
it does not cause any failures. The DNs may be returned a less than ideal order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to