[ https://issues.apache.org/jira/browse/HDFS-17780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDFS-17780: ---------------------------------- Labels: pull-request-available (was: ) > The retry logic in IncrementalBlockReport may bypass the configured IBR > interval, causing contention on NameNode > ---------------------------------------------------------------------------------------------------------------- > > Key: HDFS-17780 > URL: https://issues.apache.org/jira/browse/HDFS-17780 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Affects Versions: 2.10.2, 3.4.1 > Reporter: Shangshu Qian > Priority: Major > Labels: pull-request-available > > In the current IncrementalBlockReportManager.sendIBR(), the IBR is retried if > the RPC (blockReceivedAndDeleted) to NN fails. > > {code:java} > void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration, > String bpid) throws IOException { > // Generate a list of the pending reports for each storage under the lock > final StorageReceivedDeletedBlocks[] reports = generateIBRs(); > if (reports.length == 0) { > // Nothing new to report. > return; > } // Send incremental block reports to the Namenode outside the lock > if (LOG.isDebugEnabled()) { > LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports)); > } > boolean success = false; > final long startTime = monotonicNow(); > try { > namenode.blockReceivedAndDeleted(registration, bpid, reports); > success = true; > } finally { if (success) { > dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime); > lastIBR = startTime; > } else { > // If we didn't succeed in sending the report, put all of the > // blocks back onto our queue, but only in the case where we > // didn't put something newer in the meantime. > putMissing(reports); > } > } > } {code} > The retry does not update the `lastIBR` variable, so the failed IBRs will be > retried. However, this retry bypasses the configured > `dfs.blockreport.incremental.intervalMsec` and will be retied on the next > heartbeat because `lastIBR` is not updated. > > If the `blockReceivedAndDeleted` fails due to the high load on the NameNode, > such retry will only make the contention worse, resulting in a feedback loop. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org