[
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219922#comment-14219922
]
Yongjun Zhang commented on HDFS-7342:
-------------------------------------
Hi Ravi,
No problem, actually you responded pretty fast and I really appreciate it!
Good thoughts!
To answer your comment#1: for the case that blocks even before penultimate
block that are COMMITTED, current code handles it as
{code}
// Only the last and the penultimate blocks may be in non COMPLETE state.
// If the penultimate block is not COMPLETE, then it must be COMMITTED.
if(nrCompleteBlocks < nrBlocks - 2 ||
nrCompleteBlocks == nrBlocks - 2 &&
curBlock != null &&
curBlock.getBlockUCState() != BlockUCState.COMMITTED) {
final String message = "DIR* NameSystem.internalReleaseLease: "
+ "attempt to release a create lock on "
+ src + " but file is already closed.";
NameNode.stateChangeLog.warn(message);
throw new IOException(message);
}
{code}
which means the lease will be released right away because of the IOException.
The exception message there is a bit misleading though. I'm actually not so
sure about the effect of releasing the lease without closing the file (e.g., my
guess is, there might be some bad effect, and it's not uncovered because this
code path is not really exercised). But I guess this kind of case would be
more rare than penultimate block being COMMITTED and last block being COMPLETE
(which I refer to as caseOfInterest). So we could possibly live with the
current code.
My suggested approach was to handle caseOfInterest is to do it similar like
penultimate block being COMPLETE and last block being COMMITTED. Another
approach is to treat them the same as the above pasted code. But since more
people are hitting caseOfInterest problem, that means the chance it happens is
relatively high. And since we are checking the minimal replication before
calling finalizeINodeFileUnderConstruction, it looks safer to close the file
before releasing the lease to me (as my suggested fix does).
To answer your comment#2, there are two other callers of the method
{{finalizeINodeFileUnderConstruction}}, {{FSNamesystem#closeFileCommitBlocks}}
and {{FSNameSystem#completeFileInternal}}. But the requirement is the same:
{{finalizeINodeFileUnderConstruction}} expects all blocks are complete and
throw an exception otherwise. Since we check minimal replication in
{{internalReleaseLease}} before calling {{finalizeINodeFileUnderConstruction}}
, that's why I think we should call {{getBlockManager().forceCompleteBlock}}
before calling {{finalizeINodeFileUnderConstruction}} in the suggested fix.
This sounds a safer solution than the pasted code above.
Comments?
Thanks.
> Lease Recovery doesn't happen some times
> ----------------------------------------
>
> Key: HDFS-7342
> URL: https://issues.apache.org/jira/browse/HDFS-7342
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.0.0-alpha
> Reporter: Ravi Prakash
> Assignee: Ravi Prakash
> Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch
>
>
> In some cases, LeaseManager tries to recover a lease, but is not able to.
> HDFS-4882 describes a possibility of that. We should fix this
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)