[
https://issues.apache.org/jira/browse/HDFS-11817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020252#comment-16020252
]
Kihwal Lee commented on HDFS-11817:
-----------------------------------
In trunk, there already is a logic to weed out null StorageInfo before putting
one to the expected locations. This was done by as part of HDFS-9040. It too
had TestRetryCacheWithHA failed, so it was also fixed as part of HDFS-9040,
although I believe my fix is better. As it is a EC-related change, the JIRA
cannot be applied to branch-2. I will back-port the relevant portion in my
patch, so that trunk and branch-2/2.8 stays more in sync. The trunk version of
my patch will contain the test case (HDFS-9040 did not add a new test case for
this) and the lease manager fix.
> A faulty node can cause a lease leak and NPE on accessing data
> --------------------------------------------------------------
>
> Key: HDFS-11817
> URL: https://issues.apache.org/jira/browse/HDFS-11817
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.8.0
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Critical
> Attachments: HDFS-11817.branch-2.patch, hdfs-11817_supplement.txt
>
>
> When the namenode performs a lease recovery for a failed write, the
> {{commitBlockSynchronization()}} will fail, if none of the new target has
> sent a received-IBR. At this point, the data is inaccessible, as the
> namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}.
> The lease recovery will be retried in about an hour by the namenode. If the
> nodes are faulty (usually when there is only one new target), they may not
> block report until this point. If this happens, lease recovery throws an
> {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove
> the lease without finalizing the inode.
> This results in an inconsistent lease state. The inode stays
> under-construction, but no more lease recovery is attempted. A manual lease
> recovery is also not allowed.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]