[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

ASF GitHub Bot (Jira) Tue, 08 Jun 2021 03:11:06 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=608376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-608376
 ]


ASF GitHub Bot logged work on HDFS-13671:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jun/21 10:10
            Start Date: 08/Jun/21 10:10
    Worklog Time Spent: 10m 
      Work Description: aajisaka commented on a change in pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#discussion_r647294150



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java
##########
@@ -238,9 +221,10 @@ ReplicaInfo remove(String bpid, long blockId) {
    * @return the number of replicas in the map
    */
   int size(String bpid) {
+    LightWeightResizableGSet<Block, ReplicaInfo> m = null;
     try (AutoCloseableLock l = readLock.acquire()) {
-      FoldedTreeSet<ReplicaInfo> set = map.get(bpid);
-      return set != null ? set.size() : 0;
+      m = map.get(bpid);

Review comment:
       The definition of `m` can be moved into the try clause.
   ```suggestion
         GSet<Block, ReplicaInfo> m = map.get(bpid);
   ```

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
##########
@@ -262,30 +257,61 @@ public AddBlockResult addBlock(BlockInfo b, Block 
reportedBlock) {
       }
     }
 
+    // add to the head of the data-node list
     b.addStorage(this, reportedBlock);
-    blocks.add(b);
+    insertToList(b);
     return result;
   }
 
   AddBlockResult addBlock(BlockInfo b) {
     return addBlock(b, b);
   }
 
-  boolean removeBlock(BlockInfo b) {
-    blocks.remove(b);
-    return b.removeStorage(this);
+  public void insertToList(BlockInfo b) {
+    blockList = b.listInsert(blockList, this);
+    numBlocks++;
+  }
+  public boolean removeBlock(BlockInfo b) {
+    blockList = b.listRemove(blockList, this);
+    if (b.removeStorage(this)) {
+      numBlocks--;
+      return true;
+    } else {
+      return false;
+    }
   }
 
   int numBlocks() {
-    return blocks.size();
+    return numBlocks;
   }
-  
+
+  Iterator<BlockInfo> getBlockIterator() {
+    return new BlockIterator(blockList);
+  }
+
   /**
-   * @return iterator to an unmodifiable set of blocks
-   * related to this {@link DatanodeStorageInfo}
+   * Move block to the head of the list of blocks belonging to the data-node.
+   * @return the index of the head of the blockList
    */
-  Iterator<BlockInfo> getBlockIterator() {
-    return Collections.unmodifiableSet(blocks).iterator();
+  int moveBlockToHead(BlockInfo b, int curIndex, int headIndex) {
+    blockList = b.moveBlockToHead(blockList, this, curIndex, headIndex);
+    return curIndex;
+  }
+
+  int getHeadIndex(DatanodeStorageInfo storageInfo) {
+    if (blockList == null) {
+      return -1;
+    }
+    return blockList.findStorageInfo(storageInfo);
+  }

Review comment:
       This function is unused. It can be removed.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
##########
@@ -118,13 +116,12 @@ public void testBlockHasMultipleReplicasOnSameDN() throws 
IOException {
     StorageBlockReport reports[] =
         new StorageBlockReport[cluster.getStoragesPerDatanode()];
 
-    ArrayList<ReplicaInfo> blocks = new ArrayList<>();
+    ArrayList<Replica> blocks = new ArrayList<Replica>();

Review comment:
       Nit:
   ```suggestion
       ArrayList<Replica> blocks = new ArrayList<>();
   ```

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
##########
@@ -256,9 +256,6 @@ message BlockReportContextProto  {
   // The block report lease ID, or 0 if we are sending without a lease to
   // bypass rate-limiting.
   optional uint64 leaseId = 4 [ default = 0 ];
-
-  // True if the reported blocks are sorted by increasing block IDs
-  optional bool sorted = 5 [default = false];

Review comment:
       We must document that field number 5 cannot be reused.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
##########
@@ -1996,7 +1996,12 @@ private void metaSave(PrintWriter out) {
     LightWeightHashSet<Long> openFileIds = new LightWeightHashSet<>();
     for (DatanodeDescriptor dataNode :
         blockManager.getDatanodeManager().getDatanodes()) {
-      for (long ucFileId : dataNode.getLeavingServiceStatus().getOpenFiles()) {
+      // Sort open files
+      LightWeightHashSet<Long> dnOpenFiles =
+          dataNode.getLeavingServiceStatus().getOpenFiles();
+      Long[] dnOpenFileIds = new Long[dnOpenFiles.size()];
+      Arrays.sort(dnOpenFiles.toArray(dnOpenFileIds));
+      for (Long ucFileId : dnOpenFileIds) {

Review comment:
       I thought we have to sort inode ID throughout the DataNodes. However, It 
can be addressed in a separate jira because it is not sorted throughout the 
DataNodes even before applying the patch.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java
##########
@@ -262,30 +257,61 @@ public AddBlockResult addBlock(BlockInfo b, Block 
reportedBlock) {
       }
     }
 
+    // add to the head of the data-node list
     b.addStorage(this, reportedBlock);
-    blocks.add(b);
+    insertToList(b);
     return result;
   }
 
   AddBlockResult addBlock(BlockInfo b) {
     return addBlock(b, b);
   }
 
-  boolean removeBlock(BlockInfo b) {
-    blocks.remove(b);
-    return b.removeStorage(this);
+  public void insertToList(BlockInfo b) {
+    blockList = b.listInsert(blockList, this);
+    numBlocks++;
+  }
+  public boolean removeBlock(BlockInfo b) {

Review comment:
       I think `DatanodeStorageInfo#removeBlock` and 
`ProvidedStorageMap#removeBlock` can be package-private.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
##########
@@ -1695,10 +1630,9 @@ private void verifyPlacementPolicy(final MiniDFSCluster 
cluster,
     LocatedBlock lb = DFSTestUtil.getAllBlocks(dfs, file).get(0);
     BlockInfo blockInfo =
         blockManager.getStoredBlock(lb.getBlock().getLocalBlock());
-    Iterator<DatanodeStorageInfo> itr = blockInfo.getStorageInfos();
     LOG.info("Block " + blockInfo + " storages: ");
-    while (itr.hasNext()) {
-      DatanodeStorageInfo dn = itr.next();
+    for (int i = 0; i < 3; i++) {
+      DatanodeStorageInfo dn = blockInfo.getStorageInfo(i);

Review comment:
       Is there any specific reason to use `#getStorageInfo(i)` instead of 
`#getStorageInfos()`? You need to assert the size >= 3?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 608376)
    Time Spent: 1h 40m  (was: 1.5h)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> ----------------------------------------------------------------------
>
>                 Key: HDFS-13671
>                 URL: https://issues.apache.org/jira/browse/HDFS-13671
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.0.3
>            Reporter: Yiqun Lin
>            Assignee: Haibin Huang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-13671-001.patch
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x00007fb505b27800 nid=0x94c3 runnable [0x00007fa861361000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>       at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

Reply via email to