[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

ASF GitHub Bot (Jira) Wed, 16 Jun 2021 01:09:09 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=611796&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611796
 ]


ASF GitHub Bot logged work on HDFS-13671:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Jun/21 08:08
            Start Date: 16/Jun/21 08:08
    Worklog Time Spent: 10m 
      Work Description: AlphaGouGe commented on a change in pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#discussion_r652451949



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##########
@@ -3111,106 +3042,127 @@ void processFirstBlockReport(
     }
   }
 
-  private void reportDiffSorted(DatanodeStorageInfo storageInfo,
-      Iterable<BlockReportReplica> newReport,
+  private void reportDiff(DatanodeStorageInfo storageInfo,
+      BlockListAsLongs newReport,
       Collection<BlockInfoToAdd> toAdd,     // add to DatanodeDescriptor
       Collection<BlockInfo> toRemove,       // remove from DatanodeDescriptor
       Collection<Block> toInvalidate,       // should be removed from DN
       Collection<BlockToMarkCorrupt> toCorrupt, // add to corrupt replicas list
       Collection<StatefulBlockInfo> toUC) { // add to under-construction list
 
-    // The blocks must be sorted and the storagenodes blocks must be sorted
-    Iterator<BlockInfo> storageBlocksIterator = storageInfo.getBlockIterator();
+    // place a delimiter in the list which separates blocks
+    // that have been reported from those that have not
     DatanodeDescriptor dn = storageInfo.getDatanodeDescriptor();
-    BlockInfo storageBlock = null;
-
-    for (BlockReportReplica replica : newReport) {
-
-      long replicaID = replica.getBlockId();
-      if (BlockIdManager.isStripedBlockID(replicaID)
-          && (!hasNonEcBlockUsingStripedID ||
-              !blocksMap.containsBlock(replica))) {
-        replicaID = BlockIdManager.convertToStripedID(replicaID);
-      }
-
-      ReplicaState reportedState = replica.getState();
-
-      LOG.debug("Reported block {} on {} size {} replicaState = {}",
-          replica, dn, replica.getNumBytes(), reportedState);
-
-      if (shouldPostponeBlocksFromFuture
-          && isGenStampInFuture(replica)) {
-        queueReportedBlock(storageInfo, replica, reportedState,
-                           QUEUE_REASON_FUTURE_GENSTAMP);
-        continue;
-      }
-
-      if (storageBlock == null && storageBlocksIterator.hasNext()) {
-        storageBlock = storageBlocksIterator.next();
-      }
-
-      do {
-        int cmp;
-        if (storageBlock == null ||
-            (cmp = Long.compare(replicaID, storageBlock.getBlockId())) < 0) {
-          // Check if block is available in NN but not yet on this storage
-          BlockInfo nnBlock = blocksMap.getStoredBlock(new Block(replicaID));
-          if (nnBlock != null) {
-            reportDiffSortedInner(storageInfo, replica, reportedState,
-                                  nnBlock, toAdd, toCorrupt, toUC);
-          } else {
-            // Replica not found anywhere so it should be invalidated
-            toInvalidate.add(new Block(replica));
-          }
-          break;
-        } else if (cmp == 0) {
-          // Replica matched current storageblock
-          reportDiffSortedInner(storageInfo, replica, reportedState,
-                                storageBlock, toAdd, toCorrupt, toUC);
-          storageBlock = null;
-        } else {
-          // replica has higher ID than storedBlock
-          // Remove all stored blocks with IDs lower than replica
-          do {
-            toRemove.add(storageBlock);
-            storageBlock = storageBlocksIterator.hasNext()
-                           ? storageBlocksIterator.next() : null;
-          } while (storageBlock != null &&
-                   Long.compare(replicaID, storageBlock.getBlockId()) > 0);
+    Block delimiterBlock = new Block();
+    BlockInfo delimiter = new BlockInfoContiguous(delimiterBlock,
+        (short) 1);
+    AddBlockResult result = storageInfo.addBlock(delimiter, delimiterBlock);
+    assert result == AddBlockResult.ADDED
+        : "Delimiting block cannot be present in the node";
+    int headIndex = 0; //currently the delimiter is in the head of the list
+    int curIndex;
+
+    if (newReport == null) {
+      newReport = BlockListAsLongs.EMPTY;
+    }
+    // scan the report and process newly reported blocks
+    for (BlockReportReplica iblk : newReport) {
+      ReplicaState iState = iblk.getState();
+      LOG.debug("Reported block {} on {} size {} replicaState = {}", iblk, dn,
+          iblk.getNumBytes(), iState);
+      BlockInfo storedBlock = processReportedBlock(storageInfo,
+          iblk, iState, toAdd, toInvalidate, toCorrupt, toUC);
+
+      // move block to the head of the list
+      if (storedBlock != null) {
+        curIndex = storedBlock.findStorageInfo(storageInfo);
+        if (curIndex >= 0) {
+          headIndex =
+              storageInfo.moveBlockToHead(storedBlock, curIndex, headIndex);
         }
-      } while (storageBlock != null);
+      }
     }
 
-    // Iterate any remaining blocks that have not been reported and remove them
-    while (storageBlocksIterator.hasNext()) {
-      toRemove.add(storageBlocksIterator.next());
+    // collect blocks that have not been reported
+    // all of them are next to the delimiter
+    Iterator<BlockInfo> it =
+        storageInfo.new BlockIterator(delimiter.getNext(0));
+    while (it.hasNext()) {
+      toRemove.add(it.next());
     }
+    storageInfo.removeBlock(delimiter);
   }
 
-  private void reportDiffSortedInner(
+  /**
+   * Process a block replica reported by the data-node.
+   * No side effects except adding to the passed-in Collections.
+   *
+   * <ol>
+   * <li>If the block is not known to the system (not in blocksMap) then the
+   * data-node should be notified to invalidate this block.</li>
+   * <li>If the reported replica is valid that is has the same generation stamp
+   * and length as recorded on the name-node, then the replica location should
+   * be added to the name-node.</li>
+   * <li>If the reported replica is not valid, then it is marked as corrupt,
+   * which triggers replication of the existing valid replicas.
+   * Corrupt replicas are removed from the system when the block
+   * is fully replicated.</li>
+   * <li>If the reported replica is for a block currently marked "under
+   * construction" in the NN, then it should be added to the
+   * BlockUnderConstructionFeature's list of replicas.</li>
+   * </ol>
+   *
+   * @param storageInfo DatanodeStorageInfo that sent the report.
+   * @param block reported block replica
+   * @param reportedState reported replica state
+   * @param toAdd add to DatanodeDescriptor
+   * @param toInvalidate missing blocks (not in the blocks map)
+   *        should be removed from the data-node
+   * @param toCorrupt replicas with unexpected length or generation stamp;
+   *        add to corrupt replicas
+   * @param toUC replicas of blocks currently under construction
+   * @return the up-to-date stored block, if it should be kept.
+   *         Otherwise, null.
+   */
+  private BlockInfo processReportedBlock(
       final DatanodeStorageInfo storageInfo,
-      final BlockReportReplica replica, final ReplicaState reportedState,
-      final BlockInfo storedBlock,
+      final Block block, final ReplicaState reportedState,
       final Collection<BlockInfoToAdd> toAdd,
+      final Collection<Block> toInvalidate,
       final Collection<BlockToMarkCorrupt> toCorrupt,
       final Collection<StatefulBlockInfo> toUC) {
 
-    assert replica != null;
-    assert storedBlock != null;
-
     DatanodeDescriptor dn = storageInfo.getDatanodeDescriptor();
+
+    LOG.debug("Reported block {} on {} size {}  replicaState = {}", block, dn,

Review comment:
       i have changed it back, removing `if (LOG.isDebugEnabled())`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 611796)
    Time Spent: 5h 40m  (was: 5.5h)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> ----------------------------------------------------------------------
>
>                 Key: HDFS-13671
>                 URL: https://issues.apache.org/jira/browse/HDFS-13671
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.0.3
>            Reporter: Yiqun Lin
>            Assignee: Haibin Huang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x00007fb505b27800 nid=0x94c3 runnable [0x00007fa861361000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>       at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

Reply via email to