[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

ASF GitHub Bot (Jira) Tue, 24 Aug 2021 20:40:12 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=641488&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641488
 ]


ASF GitHub Bot logged work on HDFS-16182:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Aug/21 03:39
            Start Date: 25/Aug/21 03:39
    Worklog Time Spent: 10m 
      Work Description: jojochuang commented on a change in pull request #3320:
URL: https://github.com/apache/hadoop/pull/3320#discussion_r695335223



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
##########
@@ -469,7 +469,7 @@ private Node chooseTarget(int numOfReplicas,
     LOG.trace("storageTypes={}", storageTypes);
 
     try {
-      if ((numOfReplicas = requiredStorageTypes.size()) == 0) {

Review comment:
       better to declare numOfReplicas a final variable at line 438.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java
##########
@@ -1337,6 +1337,51 @@ public void testChooseSsdOverDisk() throws Exception {
     Assert.assertEquals(StorageType.DISK, targets[1].getStorageType());
   }
 
+  @Test
+  public void testAddDatanode2ExistingPipelineInSsd() throws Exception {
+    BlockStoragePolicy policy = POLICY_SUITE.getPolicy(ALLSSD);
+
+    final String[] racks = {"/d1/r1", "/d2/r2", "/d3/r3", "/d4/r4", "/d5/r5",
+        "/d6/r6", "/d7/r7"};
+    final String[] hosts = {"host1", "host2", "host3", "host4", "host5",
+        "host6", "host7"};
+    final StorageType[] disks = {StorageType.DISK, StorageType.DISK, 
StorageType.DISK};
+
+    final DatanodeStorageInfo[] diskStorages
+        = DFSTestUtil.createDatanodeStorageInfos(7, racks, hosts, disks);
+    final DatanodeDescriptor[] dataNodes
+        = DFSTestUtil.toDatanodeDescriptor(diskStorages);
+    for(int i = 0; i < dataNodes.length ; i++) {
+      BlockManagerTestUtil.updateStorage(dataNodes[i],
+          new DatanodeStorage("ssd" + i + 1, DatanodeStorage.State.NORMAL,
+              StorageType.SSD));
+    }
+
+    FileSystem.setDefaultUri(conf, "hdfs://localhost:0");
+    conf.set(DFSConfigKeys.DFS_NAMENODE_HTTP_ADDRESS_KEY, "0.0.0.0:0");
+    File baseDir = PathUtils.getTestDir(TestReplicationPolicy.class);
+    conf.set(DFSConfigKeys.DFS_NAMENODE_NAME_DIR_KEY,
+        new File(baseDir, "name").getPath());
+    DFSTestUtil.formatNameNode(conf);
+    NameNode namenode = new NameNode(conf);
+
+    final BlockManager bm = namenode.getNamesystem().getBlockManager();
+    BlockPlacementPolicy replicator = bm.getBlockPlacementPolicy();
+    NetworkTopology cluster = bm.getDatanodeManager().getNetworkTopology();
+    for (DatanodeDescriptor datanode : dataNodes) {
+      cluster.add(datanode);
+    }
+    // chsenDs are DISK StorageType to simulate not enough SDD Storage
+    List<DatanodeStorageInfo> chsenDs = new ArrayList<>();
+    chsenDs.add(diskStorages[0]);
+    chsenDs.add(diskStorages[1]);
+    DatanodeStorageInfo[] targets = replicator.chooseTarget("/foo", 1,
+        null, chsenDs, true,
+        new HashSet<Node>(), 0, policy, null);
+    System.out.println(policy.getName() + ": " + Arrays.asList(targets));

Review comment:
       Please use log4j to log the message.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 641488)
    Time Spent: 1h  (was: 50m)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16182
>                 URL: https://issues.apache.org/jira/browse/HDFS-16182
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namanode
>    Affects Versions: 3.4.0
>            Reporter: Max  Xie
>            Assignee: Max  Xie
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-16182.patch
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

Reply via email to