[ 
https://issues.apache.org/jira/browse/HDDS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Pogde updated HDDS-2249:
---------------------------------
    Target Version/s: 1.2.0

I am managing the 1.1.0 release and we currently have more than 600 issues 
targeted for 1.1.0. I am moving the target field to 1.2.0. 

If you are actively working on this jira and believe this should be targeted to 
1.1.0 release, Please change the target field back to 1.1.0 before Feb 05, 
2021. 

> SortDatanodes does not return correct orders when many DNs on a given host
> --------------------------------------------------------------------------
>
>                 Key: HDDS-2249
>                 URL: https://issues.apache.org/jira/browse/HDDS-2249
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Priority: Major
>              Labels: TriagePending
>
> In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list 
> of nodes rather than a single entry, to handle the case where many datanodes 
> are running on the same host.
> In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from 
> getNodesByAddress to determine if the client submitting the request is 
> running on a cluster node, and if it is, it attempts to sort the datanodes by 
> distance from the client machine.
> To do this, the code currently takes the first DatanodeDetails object 
> returned by getHostsByAddress and then compares it with the other passed in 
> nodes. If any of the passed nodes are equal to the client node (based on the 
> Java object ID) it returns a zero distance, otherwise the distance is 
> calculated.
> The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later 
> calls NetworkTopologyImpl.getDistanceCost() which is where the object 
> comparison is performed:
> {code}
> if ((node1 != null && node2 != null && node1.equals(node2)) ||
>  (node1 == null && node2 == null)) {
>  return 0;
> }
> {code}
> This does not always work when there are many datanodes on the same host, as 
> the first node returned from getNodesByAddress() is guarantted to be on the 
> same host as the client, but the list of passed datanodes may not include 
> that datanode instance.
> To fix this, we should probably have getDistanceCost() compare hostnames or 
> IP as a second check or instead of the object equality, however this is not 
> trivial to implement.
> The reason, is that getDistanceCost() takes Node objects (not 
> DatanodeDetails) and a Node does not have a IP or Hostname field. It does 
> have a getNetworkName method, which should return the hostname, but it is 
> overwritten by the hosts UUID when it registed to the node manager, by this 
> line in NodeManager.register():
> datanodeDetails.setNetworkName(datanodeDetails.getUuidString());
>  
> Note this only affects test clusters where many DNs are on a single host, and 
> it does not cause any failures. The DNs may be returned a less than ideal 
> order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to