[
https://issues.apache.org/jira/browse/HDFS-17780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019467#comment-18019467
]
ASF GitHub Bot commented on HDFS-17780:
---------------------------------------
github-actions[bot] closed pull request #7681: HDFS-17780. The retry logic in
IncrementalBlockReport may bypass the configured IBR interval, causing
contention on NameNode
URL: https://github.com/apache/hadoop/pull/7681
> The retry logic in IncrementalBlockReport may bypass the configured IBR
> interval, causing contention on NameNode
> ----------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-17780
> URL: https://issues.apache.org/jira/browse/HDFS-17780
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, namenode
> Affects Versions: 2.10.2, 3.4.1
> Reporter: Shangshu Qian
> Priority: Major
> Labels: pull-request-available
>
> In the current IncrementalBlockReportManager.sendIBR(), the IBR is retried if
> the RPC (blockReceivedAndDeleted) to NN fails.
>
> {code:java}
> void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration,
> String bpid) throws IOException {
> // Generate a list of the pending reports for each storage under the lock
> final StorageReceivedDeletedBlocks[] reports = generateIBRs();
> if (reports.length == 0) {
> // Nothing new to report.
> return;
> } // Send incremental block reports to the Namenode outside the lock
> if (LOG.isDebugEnabled()) {
> LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports));
> }
> boolean success = false;
> final long startTime = monotonicNow();
> try {
> namenode.blockReceivedAndDeleted(registration, bpid, reports);
> success = true;
> } finally { if (success) {
> dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime);
> lastIBR = startTime;
> } else {
> // If we didn't succeed in sending the report, put all of the
> // blocks back onto our queue, but only in the case where we
> // didn't put something newer in the meantime.
> putMissing(reports);
> }
> }
> } {code}
> The retry does not update the `lastIBR` variable, so the failed IBRs will be
> retried. However, this retry bypasses the configured
> `dfs.blockreport.incremental.intervalMsec` and will be retied on the next
> heartbeat because `lastIBR` is not updated.
>
> If the `blockReceivedAndDeleted` fails due to the high load on the NameNode,
> such retry will only make the contention worse, resulting in a feedback loop.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]