[ https://issues.apache.org/jira/browse/HDFS-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nikola Vujic resolved HDFS-5184. -------------------------------- Resolution: Done This is fixed in HDP 2 with the new implementation of the block placement policy with node group. > BlockPlacementPolicyWithNodeGroup does not work correct when avoidStaleNodes > is true > ------------------------------------------------------------------------------------ > > Key: HDFS-5184 > URL: https://issues.apache.org/jira/browse/HDFS-5184 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Nikola Vujic > Priority: Minor > > If avoidStaleNodes is true then choosing targets is potentially done in two > attempts. If we don't find enough targets to place replicas in the first > attempt then second attempt is invoked with the aim to use stale nodes in > order to find the remaining targets. This second attempt breaks node group > rule of not having two replicas in the same node group. > Invocation of the second attempt looks like this: > {code} > DatanodeDescriptor choseTarget(excludeNodes,...) { > oldExcludedNodes=new HashMap<Node, Node>(excludedNodes); > // first attempt > // if we don't find enough targets then > if (avoidStaleNodes) { > for (Node node : results) { > oldExcludedNodes.put(node, node); > } > numOfReplicas = totalReplicasExpected - results.size(); > return chooseTarget(numOfReplicas, writer, oldExcludedNodes, blocksize, > maxNodesPerRack, results, false); > } > } > {code} > So, all excluded nodes from the first attempt which are neither in > oldExcludedNodes nor in results will be ignored and the second invocation of > chooseTarget will use an incomplete set of excluded nodes. For example, if we > have next topology: > dn1 -> /d1/r1/n1 > dn2 -> /d1/r1/n1 > dn3 -> /d1/r1/n2 > dn4 -> /d1/r1/n2 > and if we want to choose 3 targets with avoidStaleNodes=true then in the > first attempt we will choose 2 targets since we have only two node groups. > Let's say we choose dn1 and dn3. Then, we will add dn1 and dn3 in the > oldExcudedNodes and use that set of excluded nodes in the second attempt. > This set of excluded nodes is incomplete and allows us to select dn2 and dn4 > in the second attempt which should not be selected due to node group > awareness but it is happening in the current code! > Repro: > - add > CONF.setBoolean(DFSConfigKeys.DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY, > true); to TestReplicationPolicyWithNodeGroup. > - testChooseMoreTargetsThanNodeGroups() should fail. -- This message was sent by Atlassian JIRA (v6.2#6252)