[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

Yongjun Zhang (JIRA) Thu, 20 Nov 2014 12:15:59 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219922#comment-14219922
 ]


Yongjun Zhang commented on HDFS-7342:
-------------------------------------

Hi Ravi,

No problem, actually you responded pretty fast and I really appreciate it!

Good thoughts! 

To answer your comment#1: for the case that blocks even before penultimate 
block that are COMMITTED, current code handles it as
{code}
    // Only the last and the penultimate blocks may be in non COMPLETE state.
    // If the penultimate block is not COMPLETE, then it must be COMMITTED.
    if(nrCompleteBlocks < nrBlocks - 2 ||
       nrCompleteBlocks == nrBlocks - 2 &&
         curBlock != null &&
         curBlock.getBlockUCState() != BlockUCState.COMMITTED) {
      final String message = "DIR* NameSystem.internalReleaseLease: "
        + "attempt to release a create lock on "
        + src + " but file is already closed.";
      NameNode.stateChangeLog.warn(message);
      throw new IOException(message);
    }
{code}
which means the lease will be released right away because of the IOException. 
The exception message there is a bit misleading though. I'm actually not so 
sure about the effect of releasing the lease without closing the file (e.g., my 
guess is, there might be some bad effect, and it's not uncovered because this 
code path is not really exercised). But  I guess this kind of case would be 
more rare than penultimate block being COMMITTED and last block being COMPLETE 
(which I refer to as caseOfInterest).  So we could possibly live with the 
current code.

My suggested approach was to handle caseOfInterest is to do it similar like 
penultimate block being COMPLETE and last block being COMMITTED. Another 
approach is to treat them the same as the above pasted code. But since more 
people are hitting caseOfInterest problem, that means the chance it happens is 
relatively high. And since we are checking the minimal replication before 
calling finalizeINodeFileUnderConstruction, it looks safer to close the file 
before releasing the lease to me (as my suggested fix does).

To answer your comment#2, there are two other callers of the method 
{{finalizeINodeFileUnderConstruction}}, {{FSNamesystem#closeFileCommitBlocks}} 
and {{FSNameSystem#completeFileInternal}}. But the requirement is the same: 
{{finalizeINodeFileUnderConstruction}} expects all blocks are complete and 
throw an exception otherwise. Since we check minimal replication in 
{{internalReleaseLease}} before calling  {{finalizeINodeFileUnderConstruction}} 
, that's why I think we should call {{getBlockManager().forceCompleteBlock}} 
before calling  {{finalizeINodeFileUnderConstruction}} in the suggested fix. 
This sounds a safer solution than the pasted code above.

Comments?

Thanks.





> Lease Recovery doesn't happen some times
> ----------------------------------------
>
>                 Key: HDFS-7342
>                 URL: https://issues.apache.org/jira/browse/HDFS-7342
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch
>
>
> In some cases, LeaseManager tries to recover a lease, but is not able to. 
> HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times

Reply via email to