[
https://issues.apache.org/jira/browse/HDDS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944694#comment-16944694
]
Siddharth Wagle commented on HDDS-2249:
---------------------------------------
+ [~sammichen] for the network topology suggestions.
> SortDatanodes does not return correct orders when many DNs on a given host
> --------------------------------------------------------------------------
>
> Key: HDDS-2249
> URL: https://issues.apache.org/jira/browse/HDDS-2249
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: SCM
> Affects Versions: 0.5.0
> Reporter: Stephen O'Donnell
> Priority: Major
>
> In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list
> of nodes rather than a single entry, to handle the case where many datanodes
> are running on the same host.
> In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from
> getNodesByAddress to determine if the client submitting the request is
> running on a cluster node, and if it is, it attempts to sort the datanodes by
> distance from the client machine.
> To do this, the code currently takes the first DatanodeDetails object
> returned by getHostsByAddress and then compares it with the other passed in
> nodes. If any of the passed nodes are equal to the client node (based on the
> Java object ID) it returns a zero distance, otherwise the distance is
> calculated.
> The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later
> calls NetworkTopologyImpl.getDistanceCost() which is where the object
> comparison is performed:
> {code}
> if ((node1 != null && node2 != null && node1.equals(node2)) ||
> (node1 == null && node2 == null)) {
> return 0;
> }
> {code}
> This does not always work when there are many datanodes on the same host, as
> the first node returned from getNodesByAddress() is guarantted to be on the
> same host as the client, but the list of passed datanodes may not include
> that datanode instance.
> To fix this, we should probably have getDistanceCost() compare hostnames or
> IP as a second check or instead of the object equality, however this is not
> trivial to implement.
> The reason, is that getDistanceCost() takes Node objects (not
> DatanodeDetails) and a Node does not have a IP or Hostname field. It does
> have a getNetworkName method, which should return the hostname, but it is
> overwritten by the hosts UUID when it registed to the node manager, by this
> line in NodeManager.register():
> datanodeDetails.setNetworkName(datanodeDetails.getUuidString());
>
> Note this only affects test clusters where many DNs are on a single host, and
> it does not cause any failures. The DNs may be returned a less than ideal
> order.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]