[jira] [Updated] (HDFS-16379) Reset fullBlockReportLeaseId after any exceptions

tomscut (Jira) Fri, 10 Dec 2021 18:52:05 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-16379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


tomscut updated HDFS-16379:
---------------------------
    Description: 
Recently we encountered FBR-related problems in the production environment, 
which were solved by introducing HDFS-12914 and HDFS-14314.

But there may be situations like this:
1 DN got *fullBlockReportLeaseId* via heartbeat.

2 DN trigger a blockReport, but some exception occurs (this may be rare, but it 
may exist), and then DN does multiple retries *without resetting* 
{*}fullBlockReportLeaseId{*}{*}{*}. Because fullBlockReportLeaseId is reset 
only if it succeeds currently.

3 After a while, the exception is cleared, but the fullBlockReportLeaseId has 
expired. *Since NN did not throw an exception after the lease expired, the DN 
considered that the blockReport was successful.* So the blockReport was not 
actually executed this time and needs to wait until the next time.

Therefore, {*}should we consider resetting the fullBlockReportLeaseId in the 
finally block{*}? The advantage of this is that lease expiration can be 
avoided. The downside is that each heartbeat will apply for a new 
fullBlockReportLeaseId during the exception, but I think this cost is 
negligible.

  was:
Recently we encountered FBR-related problems in the production environment, 
which were solved by introducing HDFS-12914 and HDFS-14314.

But there may be situations like this:
1 DN got *fullBlockReportLeaseId* via heartbeat.

2 DN trigger a blockReport, but some exception occurs (this may be rare, but it 
may exist), and then DN does multiple retries {*}without resetting leaseID{*}. 
Because leaseID is reset only if it succeeds currently.

3 After a while, the exception is cleared, but the LeaseID has expired. *Since 
NN did not throw an exception after the lease expired, the DN considered that 
the blockReport was successful.* So the blockReport was not actually executed 
this time and needs to wait until the next time.


Therefore, {*}should we consider resetting the fullBlockReportLeaseId in the 
finally block{*}? The advantage of this is that lease expiration can be 
avoided. The downside is that each heartbeat will apply for a new 
fullBlockReportLeaseId during the exception, but I think this cost is 
negligible.


> Reset fullBlockReportLeaseId after any exceptions
> -------------------------------------------------
>
>                 Key: HDFS-16379
>                 URL: https://issues.apache.org/jira/browse/HDFS-16379
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: tomscut
>            Assignee: tomscut
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently we encountered FBR-related problems in the production environment, 
> which were solved by introducing HDFS-12914 and HDFS-14314.
> But there may be situations like this:
> 1 DN got *fullBlockReportLeaseId* via heartbeat.
> 2 DN trigger a blockReport, but some exception occurs (this may be rare, but 
> it may exist), and then DN does multiple retries *without resetting* 
> {*}fullBlockReportLeaseId{*}{*}{*}. Because fullBlockReportLeaseId is reset 
> only if it succeeds currently.
> 3 After a while, the exception is cleared, but the fullBlockReportLeaseId has 
> expired. *Since NN did not throw an exception after the lease expired, the DN 
> considered that the blockReport was successful.* So the blockReport was not 
> actually executed this time and needs to wait until the next time.
> Therefore, {*}should we consider resetting the fullBlockReportLeaseId in the 
> finally block{*}? The advantage of this is that lease expiration can be 
> avoided. The downside is that each heartbeat will apply for a new 
> fullBlockReportLeaseId during the exception, but I think this cost is 
> negligible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-16379) Reset fullBlockReportLeaseId after any exceptions

Reply via email to