[jira] [Comment Edited] (HDFS-17218) NameNode should remove its excess blocks from the ExcessRedundancyMap When a DN registers

Haiyang Hu (Jira) Mon, 09 Oct 2023 23:49:49 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773563#comment-17773563
 ]


Haiyang Hu edited comment on HDFS-17218 at 10/10/23 6:48 AM:
-------------------------------------------------------------

Sure, attach code snippet that shows the current implements :
{quote}
6. dn restarts will FBR is executed (processFirstBlockReport will not be 
executed here, processReport will be executed). since block1 is not a new 
block, the processExtraRedundancy logic will not be executed.
{quote}
In HeartbeatManager#register(final DatanodeDescriptor d)
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java#L230-L238

{code:java}
here current dn still is alive(expired heartbeat time has not been exceeded), 
dn register will not call d.updateHeartbeatState, so 
torageInfo.hasReceivedBlockReport() still is true
synchronized void register(final DatanodeDescriptor d) {
  if (!d.isAlive()) {
    addDatanode(d);

    //update its timestamp
    d.updateHeartbeatState(StorageReport.EMPTY_ARRAY, 0L, 0L, 0, 0, null);
    stats.add(d);
  }
}
{code}

In BlockManager#processReport,  the dn restart run FBR, here current dn still 
is alive,storageInfo.hasReceivedBlockReport() is true, so will call method 
processReport
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2916-L2946

{code:java}
      if (!storageInfo.hasReceivedBlockReport()) {
        // The first block report can be processed a lot more efficiently than
        // ordinary block reports.  This shortens restart times.
        blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: Processing 
first "
            + "storage report for {} from datanode {}",
            strBlockReportId, fullBrLeaseId,
            storageInfo.getStorageID(),
            nodeID);
        processFirstBlockReport(storageInfo, newReport);
      } else {
        // Block reports for provided storage are not
        // maintained by DN heartbeats
        if (!StorageType.PROVIDED.equals(storageInfo.getStorageType())) {
          invalidatedBlocks = processReport(storageInfo, newReport);
        }
      }
      storageInfo.receivedBlockReport();
    } finally {
      endTime = Time.monotonicNow();
      namesystem.writeUnlock("processReport");
    }

    if (blockLog.isDebugEnabled()) {
      for (Block b : invalidatedBlocks) {
        blockLog.debug("BLOCK* processReport 0x{} with lease ID 0x{}: {} on 
node {} size {} " +
                "does not belong to any file.", strBlockReportId, 
fullBrLeaseId, b,
            node, b.getNumBytes());
      }
    }
{code}

In BlockManager#processReport run FBR, since the current DatanodeStorageInfo 
exists in the triplets in the BlockInfo corresponding to the reported block, so 
will not add toAdd list, addStoredBlock and processExtraRedundancy logic will 
not be executed.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3044-L3085

{code:java}
Collection<Block> processReport(
      final DatanodeStorageInfo storageInfo,
      final BlockListAsLongs report) throws IOException {
    // Normal case:
    // Modify the (block-->datanode) map, according to the difference
    // between the old and new block report.
    //
    Collection<BlockInfoToAdd> toAdd = new ArrayList<>();
    Collection<BlockInfo> toRemove = new HashSet<>();
    Collection<Block> toInvalidate = new ArrayList<>();
    Collection<BlockToMarkCorrupt> toCorrupt = new ArrayList<>();
    Collection<StatefulBlockInfo> toUC = new ArrayList<>();
    reportDiff(storageInfo, report,
                 toAdd, toRemove, toInvalidate, toCorrupt, toUC);

    DatanodeDescriptor node = storageInfo.getDatanodeDescriptor();
    // Process the blocks on each queue
    for (StatefulBlockInfo b : toUC) {
      addStoredBlockUnderConstruction(b, storageInfo);
    }
    for (BlockInfo b : toRemove) {
      removeStoredBlock(b, node);
    }
    int numBlocksLogged = 0;
    for (BlockInfoToAdd b : toAdd) {
      addStoredBlock(b.stored, b.reported, storageInfo, null,
          numBlocksLogged < maxNumBlocksToLog);
      numBlocksLogged++;
    }
    if (numBlocksLogged > maxNumBlocksToLog) {
      blockLog.info("BLOCK* processReport: logged info for {} of {} " +
          "reported.", maxNumBlocksToLog, numBlocksLogged);
    }
    for (Block b : toInvalidate) {
      addToInvalidates(b, node);
    }
    for (BlockToMarkCorrupt b : toCorrupt) {
      markBlockAsCorrupt(b, storageInfo, node);
    }

    return toInvalidate;
  }
{code}



was (Author: haiyang hu):
Sure, attach code snippet that shows the current implements :
{quote}
6. dn restarts will FBR is executed (processFirstBlockReport will not be 
executed here, processReport will be executed). since block1 is not a new 
block, the processExtraRedundancy logic will not be executed.
{quote}
In HeartbeatManager#register(final DatanodeDescriptor d)
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java#L230-L238

{code:java}
here current dn still is alive(expired heartbeat time has not been exceeded), 
dn register will not call d.updateHeartbeatState, so 
torageInfo.hasReceivedBlockReport() still is true
synchronized void register(final DatanodeDescriptor d) {
  if (!d.isAlive()) {
    addDatanode(d);

    //update its timestamp
    d.updateHeartbeatState(StorageReport.EMPTY_ARRAY, 0L, 0L, 0, 0, null);
    stats.add(d);
  }
}
{code}

In BlockManager#processReport,  the dn restart run FBR, here current dn still 
is alive,storageInfo.hasReceivedBlockReport() is true, so will call method 
processReport
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2916-L2946

{code:java}
      if (!storageInfo.hasReceivedBlockReport()) {
        // The first block report can be processed a lot more efficiently than
        // ordinary block reports.  This shortens restart times.
        blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: Processing 
first "
            + "storage report for {} from datanode {}",
            strBlockReportId, fullBrLeaseId,
            storageInfo.getStorageID(),
            nodeID);
        processFirstBlockReport(storageInfo, newReport);
      } else {
        // Block reports for provided storage are not
        // maintained by DN heartbeats
        if (!StorageType.PROVIDED.equals(storageInfo.getStorageType())) {
          invalidatedBlocks = processReport(storageInfo, newReport);
        }
      }
      storageInfo.receivedBlockReport();
    } finally {
      endTime = Time.monotonicNow();
      namesystem.writeUnlock("processReport");
    }

    if (blockLog.isDebugEnabled()) {
      for (Block b : invalidatedBlocks) {
        blockLog.debug("BLOCK* processReport 0x{} with lease ID 0x{}: {} on 
node {} size {} " +
                "does not belong to any file.", strBlockReportId, 
fullBrLeaseId, b,
            node, b.getNumBytes());
      }
    }
{code}

In BlockManager#processReport run FBR, since the current DatanodeStorageInfo 
exists in the triplets in the BlockInfo corresponding to the reported block, so 
will not add toAdd list, addStoredBlock and processExtraRedundancy logic will 
not be executed.
https://github.com/apache/hadoop 
/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3044-L3085

{code:java}
Collection<Block> processReport(
      final DatanodeStorageInfo storageInfo,
      final BlockListAsLongs report) throws IOException {
    // Normal case:
    // Modify the (block-->datanode) map, according to the difference
    // between the old and new block report.
    //
    Collection<BlockInfoToAdd> toAdd = new ArrayList<>();
    Collection<BlockInfo> toRemove = new HashSet<>();
    Collection<Block> toInvalidate = new ArrayList<>();
    Collection<BlockToMarkCorrupt> toCorrupt = new ArrayList<>();
    Collection<StatefulBlockInfo> toUC = new ArrayList<>();
    reportDiff(storageInfo, report,
                 toAdd, toRemove, toInvalidate, toCorrupt, toUC);

    DatanodeDescriptor node = storageInfo.getDatanodeDescriptor();
    // Process the blocks on each queue
    for (StatefulBlockInfo b : toUC) {
      addStoredBlockUnderConstruction(b, storageInfo);
    }
    for (BlockInfo b : toRemove) {
      removeStoredBlock(b, node);
    }
    int numBlocksLogged = 0;
    for (BlockInfoToAdd b : toAdd) {
      addStoredBlock(b.stored, b.reported, storageInfo, null,
          numBlocksLogged < maxNumBlocksToLog);
      numBlocksLogged++;
    }
    if (numBlocksLogged > maxNumBlocksToLog) {
      blockLog.info("BLOCK* processReport: logged info for {} of {} " +
          "reported.", maxNumBlocksToLog, numBlocksLogged);
    }
    for (Block b : toInvalidate) {
      addToInvalidates(b, node);
    }
    for (BlockToMarkCorrupt b : toCorrupt) {
      markBlockAsCorrupt(b, storageInfo, node);
    }

    return toInvalidate;
  }
{code}


> NameNode should remove its excess blocks from the ExcessRedundancyMap When a 
> DN registers
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-17218
>                 URL: https://issues.apache.org/jira/browse/HDFS-17218
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>
> Currently found that DN will lose all pending DNA_INVALIDATE blocks if it 
> restarts.
> *Root case*
> Current DN enables asynchronously deletion, it have many pending deletion 
> blocks in memory.
> when DN restarts, these cached blocks may be lost. it causes some blocks in 
> the excess map in the namenode to be leaked and this will result in many 
> blocks having more replicas then expected.
> *solution*
> Consider NameNode should remove its excess blocks from the 
> ExcessRedundancyMap When a DN registers,
> this approach will ensure that when processing the DN's full block report, 
> the 'processExtraRedundancy' can be performed according to the actual of the 
> blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-17218) NameNode should remove its excess blocks from the ExcessRedundancyMap When a DN registers

Reply via email to