[jira] [Commented] (HDFS-17780) The retry logic in IncrementalBlockReport may bypass the configured IBR interval, causing contention on NameNode

ASF GitHub Bot (Jira) Sun, 11 May 2025 12:17:39 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950796#comment-17950796
 ]


ASF GitHub Bot commented on HDFS-17780:
---------------------------------------

shangshu-qian opened a new pull request, #7681:
URL: https://github.com/apache/hadoop/pull/7681

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   As described in 
[HDFS-17780](https://issues.apache.org/jira/browse/HDFS-17780), the retry logic 
in sendIBR() can bypass the configuration of 
`dfs.blockreport.incremental.intervalMsec` and cause the IBR to be sent with 
every heartbeat. 
   
   The fix updates the IBR timestamp every time the RPC is called.
   
   ### How was this patch tested?
   
   No test needed.
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> The retry logic in IncrementalBlockReport may bypass the configured IBR 
> interval, causing contention on NameNode
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17780
>                 URL: https://issues.apache.org/jira/browse/HDFS-17780
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.10.2, 3.4.1
>            Reporter: Shangshu Qian
>            Priority: Major
>
> In the current IncrementalBlockReportManager.sendIBR(), the IBR is retried if 
> the RPC (blockReceivedAndDeleted) to NN fails.
>  
> {code:java}
>   void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration,
>       String bpid) throws IOException {
>     // Generate a list of the pending reports for each storage under the lock
>     final StorageReceivedDeletedBlocks[] reports = generateIBRs();
>     if (reports.length == 0) {
>       // Nothing new to report.
>       return;
>     }    // Send incremental block reports to the Namenode outside the lock
>     if (LOG.isDebugEnabled()) {
>       LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports));
>     }
>     boolean success = false;
>     final long startTime = monotonicNow();
>     try {
>       namenode.blockReceivedAndDeleted(registration, bpid, reports);
>       success = true;
>     } finally {      if (success) {
>         dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime);
>         lastIBR = startTime;
>       } else {
>         // If we didn't succeed in sending the report, put all of the
>         // blocks back onto our queue, but only in the case where we
>         // didn't put something newer in the meantime.
>         putMissing(reports);
>       }
>     }
>   } {code}
> The retry does not update the `lastIBR` variable, so the failed IBRs will be 
> retried. However, this retry bypasses the configured 
> `dfs.blockreport.incremental.intervalMsec` and will be retied on the next 
> heartbeat because `lastIBR` is not updated.
>  
> If the `blockReceivedAndDeleted` fails due to the high load on the NameNode, 
> such retry will only make the contention worse, resulting in a feedback loop.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17780) The retry logic in IncrementalBlockReport may bypass the configured IBR interval, causing contention on NameNode

Reply via email to