[
https://issues.apache.org/jira/browse/HDFS-16456?focusedWorklogId=753952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753952
]
ASF GitHub Bot logged work on HDFS-16456:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 07/Apr/22 10:22
Start Date: 07/Apr/22 10:22
Worklog Time Spent: 10m
Work Description: tasanuma commented on code in PR #4126:
URL: https://github.com/apache/hadoop/pull/4126#discussion_r844954667
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockPlacementPolicyRackFaultTolerant.java:
##########
@@ -167,6 +179,107 @@ private void doTestChooseTargetSpecialCase() throws
Exception {
}
}
+ /**
+ * Verify decommission a dn which is an only node in its rack.
+ */
+ public void testPlacementWithOnlyOneNodeInRackDecommission() throws
Exception {
Review Comment:
Instead of adding `testPlacementWithOnlyOneNodeInRackDecommission()` to
`testChooseTarget()`, it would be better to create
`testPlacementWithOnlyOneNodeInRackDecommission()` as one unit test method.
```suggestion
@Test
public void testPlacementWithOnlyOneNodeInRackDecommission() throws
Exception {
```
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java:
##########
@@ -101,6 +101,13 @@ protected NetworkTopology init(InnerNode.Factory factory) {
private int depthOfAllLeaves = -1;
/** rack counter */
protected int numOfRacks = 0;
+ /** empty rack map, rackname->nodenumber. */
+ private HashMap<String, Set<String>> emptyRackMap =
Review Comment:
This is a rack map to calculate the number of empty racks. So I prefer to
use the name `rackMap`.
```suggestion
private HashMap<String, Set<String>> rackMap =
```
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockPlacementPolicyRackFaultTolerant.java:
##########
@@ -167,6 +179,107 @@ private void doTestChooseTargetSpecialCase() throws
Exception {
}
}
+ /**
+ * Verify decommission a dn which is an only node in its rack.
+ */
+ public void testPlacementWithOnlyOneNodeInRackDecommission() throws
Exception {
+ Configuration conf = new HdfsConfiguration();
+ final String[] racks = {"/RACK0", "/RACK0", "/RACK2", "/RACK3", "/RACK4",
"/RACK5", "/RACK2"};
+ final String[] hosts = {"/host0", "/host1", "/host2", "/host3", "/host4",
"/host5", "/host6"};
+
+ // enables DFSNetworkTopology
+ conf.setClass(DFSConfigKeys.DFS_BLOCK_REPLICATOR_CLASSNAME_KEY,
+ BlockPlacementPolicyRackFaultTolerant.class,
+ BlockPlacementPolicy.class);
+ conf.setBoolean(DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY, true);
+ conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, DEFAULT_BLOCK_SIZE);
+ conf.setInt(DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY,
+ DEFAULT_BLOCK_SIZE / 2);
+
+ if (cluster != null) {
+ cluster.shutdown();
+ }
+ cluster = new MiniDFSCluster.Builder(conf).numDataNodes(7).racks(racks)
+ .hosts(hosts).build();
+ cluster.waitActive();
+ nameNodeRpc = cluster.getNameNodeRpc();
+ namesystem = cluster.getNamesystem();
+ DistributedFileSystem fs = cluster.getFileSystem();
+ fs.enableErasureCodingPolicy("RS-3-2-1024k");
+ fs.setErasureCodingPolicy(new Path("/"), "RS-3-2-1024k");
+
+ final BlockManager bm = cluster.getNamesystem().getBlockManager();
+ final DatanodeManager dm = bm.getDatanodeManager();
+ assertTrue(dm.getNetworkTopology() instanceof DFSNetworkTopology);
+
+ String clientMachine = "/host4";
+ String clientRack = "/RACK4";
+ String src = "/test";
+
+ final DatanodeManager dnm =
namesystem.getBlockManager().getDatanodeManager();
+ DatanodeDescriptor dnd4 =
dnm.getDatanode(cluster.getDataNodes().get(4).getDatanodeId());
+ assertEquals(dnd4.getNetworkLocation(), clientRack);
+ dnm.getDatanodeAdminManager().startDecommission(dnd4);
+ short replication = 5;
+ short additionalReplication = 1;
+
+ try {
+ // Create the file with client machine
+ HdfsFileStatus fileStatus = namesystem.startFile(src, perm,
+ clientMachine, clientMachine, EnumSet.of(CreateFlag.CREATE), true,
+ replication, DEFAULT_BLOCK_SIZE * 1024 * 10, null, null, null,
false);
+
+ //test chooseTarget for new file
+ LocatedBlock locatedBlock = nameNodeRpc.addBlock(src, clientMachine,
+ null, null, fileStatus.getFileId(), null, null);
+ HashMap<String, Integer> racksCount = new HashMap<String, Integer>();
+ doTestLocatedBlockRacks(racksCount, replication, 4, locatedBlock);
+
+ //test chooseTarget for existing file.
+ LocatedBlock additionalLocatedBlock =
+ nameNodeRpc.getAdditionalDatanode(src, fileStatus.getFileId(),
+ locatedBlock.getBlock(), locatedBlock.getLocations(),
+ locatedBlock.getStorageIDs(), DatanodeInfo.EMPTY_ARRAY,
+ additionalReplication, clientMachine);
+
+ racksCount.clear();
+ doTestLocatedBlockRacks(racksCount, additionalReplication + replication,
+ 4, additionalLocatedBlock);
+ assertEquals(racksCount.get("/RACK0"), (Integer)2);
+ assertEquals(racksCount.get("/RACK2"), (Integer)2);
+ } finally {
+ dnm.getDatanodeAdminManager().stopDecommission(dnd4);
+ }
+
+ //test if decommission successed
Review Comment:
```suggestion
//test if decommission succeeded
```
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockPlacementPolicyRackFaultTolerant.java:
##########
@@ -167,6 +179,107 @@ private void doTestChooseTargetSpecialCase() throws
Exception {
}
}
+ /**
+ * Verify decommission a dn which is an only node in its rack.
+ */
+ public void testPlacementWithOnlyOneNodeInRackDecommission() throws
Exception {
+ Configuration conf = new HdfsConfiguration();
+ final String[] racks = {"/RACK0", "/RACK0", "/RACK2", "/RACK3", "/RACK4",
"/RACK5", "/RACK2"};
+ final String[] hosts = {"/host0", "/host1", "/host2", "/host3", "/host4",
"/host5", "/host6"};
+
+ // enables DFSNetworkTopology
+ conf.setClass(DFSConfigKeys.DFS_BLOCK_REPLICATOR_CLASSNAME_KEY,
+ BlockPlacementPolicyRackFaultTolerant.class,
+ BlockPlacementPolicy.class);
+ conf.setBoolean(DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY, true);
+ conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, DEFAULT_BLOCK_SIZE);
+ conf.setInt(DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY,
+ DEFAULT_BLOCK_SIZE / 2);
+
+ if (cluster != null) {
+ cluster.shutdown();
+ }
+ cluster = new MiniDFSCluster.Builder(conf).numDataNodes(7).racks(racks)
+ .hosts(hosts).build();
+ cluster.waitActive();
+ nameNodeRpc = cluster.getNameNodeRpc();
+ namesystem = cluster.getNamesystem();
+ DistributedFileSystem fs = cluster.getFileSystem();
+ fs.enableErasureCodingPolicy("RS-3-2-1024k");
+ fs.setErasureCodingPolicy(new Path("/"), "RS-3-2-1024k");
+
+ final BlockManager bm = cluster.getNamesystem().getBlockManager();
+ final DatanodeManager dm = bm.getDatanodeManager();
+ assertTrue(dm.getNetworkTopology() instanceof DFSNetworkTopology);
+
+ String clientMachine = "/host4";
+ String clientRack = "/RACK4";
+ String src = "/test";
+
+ final DatanodeManager dnm =
namesystem.getBlockManager().getDatanodeManager();
+ DatanodeDescriptor dnd4 =
dnm.getDatanode(cluster.getDataNodes().get(4).getDatanodeId());
+ assertEquals(dnd4.getNetworkLocation(), clientRack);
+ dnm.getDatanodeAdminManager().startDecommission(dnd4);
+ short replication = 5;
+ short additionalReplication = 1;
+
+ try {
+ // Create the file with client machine
+ HdfsFileStatus fileStatus = namesystem.startFile(src, perm,
+ clientMachine, clientMachine, EnumSet.of(CreateFlag.CREATE), true,
+ replication, DEFAULT_BLOCK_SIZE * 1024 * 10, null, null, null,
false);
+
+ //test chooseTarget for new file
+ LocatedBlock locatedBlock = nameNodeRpc.addBlock(src, clientMachine,
+ null, null, fileStatus.getFileId(), null, null);
+ HashMap<String, Integer> racksCount = new HashMap<String, Integer>();
+ doTestLocatedBlockRacks(racksCount, replication, 4, locatedBlock);
+
+ //test chooseTarget for existing file.
+ LocatedBlock additionalLocatedBlock =
+ nameNodeRpc.getAdditionalDatanode(src, fileStatus.getFileId(),
+ locatedBlock.getBlock(), locatedBlock.getLocations(),
+ locatedBlock.getStorageIDs(), DatanodeInfo.EMPTY_ARRAY,
+ additionalReplication, clientMachine);
+
+ racksCount.clear();
+ doTestLocatedBlockRacks(racksCount, additionalReplication + replication,
+ 4, additionalLocatedBlock);
+ assertEquals(racksCount.get("/RACK0"), (Integer)2);
+ assertEquals(racksCount.get("/RACK2"), (Integer)2);
+ } finally {
+ dnm.getDatanodeAdminManager().stopDecommission(dnd4);
+ }
+
+ //test if decommission successed
+ DatanodeDescriptor dnd3 =
dnm.getDatanode(cluster.getDataNodes().get(3).getDatanodeId());
+ cluster.getNamesystem().writeLock();
+ try {
+ dm.getDatanodeAdminManager().startDecommission(dnd3);
+ } finally {
+ cluster.getNamesystem().writeUnlock();
+ }
+
+ // make sure the decommission finishes and the block in on 6 racks
Review Comment:
maybe
```suggestion
// make sure the decommission finishes and the block in on 4 racks
```
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java:
##########
@@ -1015,4 +1024,108 @@ protected static boolean isNodeInScope(Node node,
String scope) {
String nodeLocation = NodeBase.getPath(node) + NodeBase.PATH_SEPARATOR_STR;
return nodeLocation.startsWith(scope);
}
-}
\ No newline at end of file
+
+ /** @return the number of nonempty racks */
+ public int getNumOfNonEmptyRacks() {
+ return numOfRacks - numOfEmptyRacks;
+ }
+
+ /**
+ * Update empty rack number when add a node like recommision.
Review Comment:
```suggestion
* Update empty rack number when add a node like recommission.
```
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java:
##########
@@ -1015,4 +1024,108 @@ protected static boolean isNodeInScope(Node node,
String scope) {
String nodeLocation = NodeBase.getPath(node) + NodeBase.PATH_SEPARATOR_STR;
return nodeLocation.startsWith(scope);
}
-}
\ No newline at end of file
+
+ /** @return the number of nonempty racks */
+ public int getNumOfNonEmptyRacks() {
+ return numOfRacks - numOfEmptyRacks;
+ }
+
+ /**
+ * Update empty rack number when add a node like recommision.
+ * @param node node to be added; can be null
+ */
+ public void recommissionNode(Node node) {
+ if (node == null) {
+ return;
+ }
+ if (node instanceof InnerNode) {
+ throw new IllegalArgumentException(
+ "Not allow to remove an inner node: " + NodeBase.getPath(node));
+ }
+ netlock.writeLock().lock();
+ try {
+ decommissionNodes.remove(node.getName());
+ interAddNodeWithEmptyRack(node);
+ } finally {
+ netlock.writeLock().unlock();
+ }
+ }
+
+ /**
+ * Update empty rack number when remove a node like decommision.
Review Comment:
```suggestion
* Update empty rack number when remove a node like decommission.
```
Issue Time Tracking
-------------------
Worklog Id: (was: 753952)
Time Spent: 1h (was: 50m)
> EC: Decommission a rack with only on dn will fail when the rack number is
> equal with replication
> ------------------------------------------------------------------------------------------------
>
> Key: HDFS-16456
> URL: https://issues.apache.org/jira/browse/HDFS-16456
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ec, namenode
> Affects Versions: 3.4.0
> Reporter: caozhiqiang
> Assignee: caozhiqiang
> Priority: Critical
> Labels: pull-request-available
> Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch,
> HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch,
> HDFS-16456.006.patch, HDFS-16456.007.patch, HDFS-16456.008.patch,
> HDFS-16456.009.patch, HDFS-16456.010.patch
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason:
> # Enable EC policy, such as RS-6-3-1024k.
> # The rack number in this cluster is equal with or less than the replication
> number(9)
> # A rack only has one DN, and decommission this DN.
> The root cause is in
> BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will
> give a limit parameter maxNodesPerRack for choose targets. In this scenario,
> the maxNodesPerRack is 1, which means each rack can only be chosen one
> datanode.
> {code:java}
> protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) {
> ...
> // If more replicas than racks, evenly spread the replicas.
> // This calculation rounds up.
> int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1;
> return new int[] {numOfReplicas, maxNodesPerRack};
> } {code}
> int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1;
> here will be called, where totalNumOfReplicas=9 and numOfRacks=9
> When we decommission one dn which is only one node in its rack, the
> chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder()
> will throw NotEnoughReplicasException, but the exception will not be caught
> and fail to fallback to chooseEvenlyFromRemainingRacks() function.
> When decommission, after choose targets, verifyBlockPlacement() function will
> return the total rack number contains the invalid rack, and
> BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false
> and it will also cause decommission fail.
> {code:java}
> public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs,
> int numberOfReplicas) {
> if (locs == null)
> locs = DatanodeDescriptor.EMPTY_ARRAY;
> if (!clusterMap.hasClusterEverBeenMultiRack()) {
> // only one rack
> return new BlockPlacementStatusDefault(1, 1, 1);
> }
> // Count locations on different racks.
> Set<String> racks = new HashSet<>();
> for (DatanodeInfo dn : locs) {
> racks.add(dn.getNetworkLocation());
> }
> return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas,
> clusterMap.getNumOfRacks());
> } {code}
> {code:java}
> public boolean isPlacementPolicySatisfied() {
> return requiredRacks <= currentRacks || currentRacks >= totalRacks;
> }{code}
> According to the above description, we should make the below modify to fix it:
> # In startDecommission() or stopDecommission(), we should also change the
> numOfRacks in class NetworkTopology. Or choose targets may fail for the
> maxNodesPerRack is too small. And even choose targets success,
> isPlacementPolicySatisfied will also return false cause decommission fail.
> # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first
> chooseOnce() function should also be put in try..catch..., or it will not
> fallback to call chooseEvenlyFromRemainingRacks() when throw exception.
> # In verifyBlockPlacement, we need to remove invalid racks from total
> numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail
> to reconstruct data.
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]