[ 
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010646#comment-16010646
 ] 

Kihwal Lee commented on HDFS-11817:
-----------------------------------

There are two possible immediate fixes that can be implemented.
- Allow {{commitBlockSynchronization()}} to complete even if a received-IBR is 
not received (I.e. last block not in COMPLETE state). This is equivalent of 
allowing closing without the last block being COMPETE.
- Do not allow {{LeaseManager}} to blindly remove the lease on a lease recovery 
failure and leave the inode in under-construction state. 

They are all simple changes that will prevent the issues from happening.  
However, I haven't been able to root cause how and where NPE is happening.  It 
is from calling {{getBlockLocations()}}, but so far I have not been able to 
reproduce it.  I will find other means to RC it.

> A faulty node can cause a lease leak and NPE on accessing data
> --------------------------------------------------------------
>
>                 Key: HDFS-11817
>                 URL: https://issues.apache.org/jira/browse/HDFS-11817
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>
> When the namenode performs a lease recovery for a failed write, the 
> {{commitBlockSynchronization()}} will fail, if none of the new target has 
> sent a received-IBR.  At this point, the data is inaccessible, as the 
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the 
> nodes are faulty (usually when there is only one new target), they may not 
> block report until this point. If this happens, lease recovery throws an 
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove 
> the lease without  finalizing the inode.  
> This results in an inconsistent lease state. The inode stays 
> under-construction, but no more lease recovery is attempted. A manual lease 
> recovery is also not allowed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to