[ https://issues.apache.org/jira/browse/HDFS-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731583#comment-16731583 ]
Ayush Saxena commented on HDFS-14181: ------------------------------------- Thanx [~sihai] for the patch. I too agree with [~elgoiri] that we should implement the correct way. I tried digging into the problem. Thats something what I understood of the scenario.Let me Know If I get it wrong somewhere. Let the setup be like :: (Cluster Map) 1 2 3 4 5 6 7 8 9 10 1-8 is scope 1-4 is excluded scope 2 and 7 is excluded node * *Present Scenario ::* Avlbl node (Present)--> 5,6,8,9,10 (5) _{Cluster Map - Excluded Scope and Excluded nodes out of Excluded Scope} \{ 10 - 4 - 1 = 5}_ Avlbl node (Expected) ---> 5,6,8 (*3*) _{Scope - Exclude Scope - Excluded nodes out of Excluded Scope but in scope }_ { 8 - 4 - 1 = 3} * *Proposed Solution ::* Part1 --> 1,3,4,5,6,8 (6) _{Scope - Excluded Nodes in scope}_ Part2 --> 1,3,4 (3) _{Excluded Scope - Excluded Nodes in Excluded Scope }_ *Solution Received* = { Part1 - Part2 } 6-3 = *3* (*Correct*) By my understanding the solution seems correct. :) Just I guess If we are expecting this method to be called from somewhere else too.We can add a NULL check for scope too in the else part for safety as before scope wasn't being used in the block. Secondly Am not very sure regarding taking the read Lock. > Suspect there is a bug in NetworkTopology.java chooseRandom function. > --------------------------------------------------------------------- > > Key: HDFS-14181 > URL: https://issues.apache.org/jira/browse/HDFS-14181 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode > Affects Versions: 2.9.2 > Reporter: Sihai Ke > Priority: Major > Attachments: 0001-add-UT-for-NetworkTopology.patch, > 0001-fix-NetworkTopology.java-chooseRandom-bug.patch, HDFS-14181.01.patch, > HDFS-14181.02.patch, image-2018-12-29-15-02-19-415.png > > > During reading the hadoop NetworkTopology.java, I suspect there is a bug in > function > chooseRandom (line 498, hadoop version 2.9.2-RC0), > I think there is a bug in{color:#f79232} code, ~excludedScope doesn't mean > availableNodes under Scope node, and I also add unit test for this and get an > exception.{color} > bug code in the else. > {code:java} > // code placeholder > if (excludedScope == null) { > availableNodes = countNumOfAvailableNodes(scope, excludedNodes); > } else { > availableNodes = > countNumOfAvailableNodes("~" + excludedScope, excludedNodes); > }{code} > Source code: > {code:java} > // code placeholder > protected Node chooseRandom(final String scope, String excludedScope, > final Collection<Node> excludedNodes) { > if (excludedScope != null) { > if (scope.startsWith(excludedScope)) { > return null; > } > if (!excludedScope.startsWith(scope)) { > excludedScope = null; > } > } > Node node = getNode(scope); > if (!(node instanceof InnerNode)) { > return excludedNodes != null && excludedNodes.contains(node) ? > null : node; > } > InnerNode innerNode = (InnerNode)node; > int numOfDatanodes = innerNode.getNumOfLeaves(); > if (excludedScope == null) { > node = null; > } else { > node = getNode(excludedScope); > if (!(node instanceof InnerNode)) { > numOfDatanodes -= 1; > } else { > numOfDatanodes -= ((InnerNode)node).getNumOfLeaves(); > } > } > if (numOfDatanodes <= 0) { > LOG.debug("Failed to find datanode (scope=\"{}\" excludedScope=\"{}\")." > + " numOfDatanodes={}", > scope, excludedScope, numOfDatanodes); > return null; > } > final int availableNodes; > if (excludedScope == null) { > availableNodes = countNumOfAvailableNodes(scope, excludedNodes); > } else { > availableNodes = > countNumOfAvailableNodes("~" + excludedScope, excludedNodes); > } > LOG.debug("Choosing random from {} available nodes on node {}," > + " scope={}, excludedScope={}, excludeNodes={}. numOfDatanodes={}.", > availableNodes, innerNode, scope, excludedScope, excludedNodes, > numOfDatanodes); > Node ret = null; > if (availableNodes > 0) { > ret = chooseRandom(innerNode, node, excludedNodes, numOfDatanodes, > availableNodes); > } > LOG.debug("chooseRandom returning {}", ret); > return ret; > } > {code} > > > Add Unit Test in TestClusterTopology.java, but get exception. > > {code:java} > // code placeholder > @Test > public void testChooseRandom1() { > // create the topology > NetworkTopology cluster = NetworkTopology.getInstance(new Configuration()); > NodeElement node1 = getNewNode("node1", "/a1/b1/c1"); > cluster.add(node1); > NodeElement node2 = getNewNode("node2", "/a1/b1/c1"); > cluster.add(node2); > NodeElement node3 = getNewNode("node3", "/a1/b1/c2"); > cluster.add(node3); > NodeElement node4 = getNewNode("node4", "/a1/b2/c3"); > cluster.add(node4); > Node node = cluster.chooseRandom("/a1/b1", "/a1/b1/c1", null); > assertSame(node.getName(), "node3"); > } > {code} > > Exception: > {code:java} > // code placeholder > java.lang.IllegalArgumentException: 1 should >= 2, and both should be > positive. > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:567) > at > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:544) > atorg.apache.hadoop.net.TestClusterTopology.testChooseRandom1(TestClusterTopology.java:198) > {code} > > {color:#f79232}!image-2018-12-29-15-02-19-415.png!{color} > > > [~vagarychen] this change is imported in PR HDFS-11577, could you help to > check whether this is a bug ? > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org