[jira] [Updated] (HDFS-17780) The retry logic in IncrementalBlockReport may bypass the configured IBR interval, causing contention on NameNode

ASF GitHub Bot (Jira) Sun, 11 May 2025 12:10:36 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-17780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HDFS-17780:
----------------------------------
    Labels: pull-request-available  (was: )

> The retry logic in IncrementalBlockReport may bypass the configured IBR 
> interval, causing contention on NameNode
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17780
>                 URL: https://issues.apache.org/jira/browse/HDFS-17780
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.10.2, 3.4.1
>            Reporter: Shangshu Qian
>            Priority: Major
>              Labels: pull-request-available
>
> In the current IncrementalBlockReportManager.sendIBR(), the IBR is retried if 
> the RPC (blockReceivedAndDeleted) to NN fails.
>  
> {code:java}
>   void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration,
>       String bpid) throws IOException {
>     // Generate a list of the pending reports for each storage under the lock
>     final StorageReceivedDeletedBlocks[] reports = generateIBRs();
>     if (reports.length == 0) {
>       // Nothing new to report.
>       return;
>     }    // Send incremental block reports to the Namenode outside the lock
>     if (LOG.isDebugEnabled()) {
>       LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports));
>     }
>     boolean success = false;
>     final long startTime = monotonicNow();
>     try {
>       namenode.blockReceivedAndDeleted(registration, bpid, reports);
>       success = true;
>     } finally {      if (success) {
>         dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime);
>         lastIBR = startTime;
>       } else {
>         // If we didn't succeed in sending the report, put all of the
>         // blocks back onto our queue, but only in the case where we
>         // didn't put something newer in the meantime.
>         putMissing(reports);
>       }
>     }
>   } {code}
> The retry does not update the `lastIBR` variable, so the failed IBRs will be 
> retried. However, this retry bypasses the configured 
> `dfs.blockreport.incremental.intervalMsec` and will be retied on the next 
> heartbeat because `lastIBR` is not updated.
>  
> If the `blockReceivedAndDeleted` fails due to the high load on the NameNode, 
> such retry will only make the contention worse, resulting in a feedback loop.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17780) The retry logic in IncrementalBlockReport may bypass the configured IBR interval, causing contention on NameNode

Reply via email to