tomscut opened a new pull request #3787:
URL: https://github.com/apache/hadoop/pull/3787


   JIRA: [HDFS-16379](https://issues.apache.org/jira/browse/HDFS-16379).
   
   Recently we encountered FBR-related problems in the production environment, 
which were solved by introducing HDFS-12914 and HDFS-14314.
   
   But there may be situations like this:
   1 DN got `fullBlockReportLeaseId` via heartbeat.
   
   2 DN trigger a blockReport, but some exception occurs (this may be rare, but 
it may exist), and then DN does multiple retries without resetting 
fullBlockReportLeaseId. Because fullBlockReportLeaseId is reset only if it 
succeeds currently.
   
   3 After a while, the exception is cleared, but the `fullBlockReportLeaseId` 
has expired. Since NN did not throw an exception after the lease expired, the 
DN considered that the blockReport was successful. So the blockReport was not 
actually executed this time and needs to wait until the next time.
   
   Therefore, should we consider resetting the `fullBlockReportLeaseId` in the 
finally block? The advantage of this is that lease expiration can be avoided. 
The downside is that each heartbeat will apply for a new 
`fullBlockReportLeaseId` during the exception, but I think this cost is 
negligible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to