tomscut opened a new pull request #3787: URL: https://github.com/apache/hadoop/pull/3787
JIRA: [HDFS-16379](https://issues.apache.org/jira/browse/HDFS-16379). Recently we encountered FBR-related problems in the production environment, which were solved by introducing HDFS-12914 and HDFS-14314. But there may be situations like this: 1 DN got `fullBlockReportLeaseId` via heartbeat. 2 DN trigger a blockReport, but some exception occurs (this may be rare, but it may exist), and then DN does multiple retries without resetting fullBlockReportLeaseId. Because fullBlockReportLeaseId is reset only if it succeeds currently. 3 After a while, the exception is cleared, but the `fullBlockReportLeaseId` has expired. Since NN did not throw an exception after the lease expired, the DN considered that the blockReport was successful. So the blockReport was not actually executed this time and needs to wait until the next time. Therefore, should we consider resetting the `fullBlockReportLeaseId` in the finally block? The advantage of this is that lease expiration can be avoided. The downside is that each heartbeat will apply for a new `fullBlockReportLeaseId` during the exception, but I think this cost is negligible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
