[ 
https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218085#comment-14218085
 ] 

Yongjun Zhang commented on HDFS-7342:
-------------------------------------

HI Ravi,

Thanks a lot for your continued effort to work on this!

I had some concern of changing {{assertAllBlocksComplete()}} to 
{{assertAllBlocksCompleteOrCommitted()}}, because 
{{finalizeINodeFileUnderConstruction}} checks to make sure all blocks are 
complete (which means all blocks need to have minimal replication) before 
closing a file. Your suggested change would change the behavior here, and I 
think it may not be safe.

Good thing is, the place that we are fixing does check the minimal replication 
requirement before calling {{finalizeINodeFileUnderConstruction}}. So I was 
looking into addressing this issue in a different approach on top of my earlier 
suggested fix at 
https://issues.apache.org/jira/browse/HDFS-4882?focusedCommentId=14215703&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14215703.
 

Here is the complete snapshot of my approach:

{code}
    // If penultimate block doesn't exist then its minReplication is met
    boolean penultimateBlockMinReplication = penultimateBlock == null ? true :
        blockManager.checkMinReplication(penultimateBlock);
    BlockUCState penultimateBlockState = penultimateBlock == null ?
        BlockUCState.COMPLETE : penultimateBlock.getBlockUCState();
    String blockStateStr = "(penultimateBlockState=" + penultimateBlockState +
        " lastBlockState="+lastBlockState + ")";

    switch(lastBlockState) {
    case COMPLETE:
      // Getting here means the penultimate block is COMMITTED, fallthrough
      // to handle the same way as when the final block is COMMITTED
    case COMMITTED:
      blockStateStr = "(penultimateBlockState=" + penultimateBlockState +
        " lastBlockState="+lastBlockState + ")";
      // Close file if committed blocks are minimally replicated
      if(penultimateBlockMinReplication &&
          blockManager.checkMinReplication(lastBlock)) {
        if (penultimateBlock != null &&
            penultimateBlockState == BlockUCState.COMMITTED) {
          getBlockManager().forceCompleteBlock(pendingFile,
              (BlockInfoUnderConstruction)penultimateBlock);
        }
        getBlockManager().forceCompleteBlock(pendingFile,
            (BlockInfoUnderConstruction) lastBlock);
        finalizeINodeFileUnderConstruction(src, pendingFile,
            iip.getLatestSnapshotId());
        NameNode.stateChangeLog.warn("BLOCK*"
          + " internalReleaseLease: Committed blocks are minimally replicated,"
          + " lease removed, file closed " + blockStateStr + ".");
        return true;  // closed!
      }
      // Cannot close file right now, since some blocks
      // are not yet minimally replicated.
      // This may potentially cause infinite loop in lease recovery
      // if there are no valid replicas on data-nodes.
      String message = "DIR* NameSystem.internalReleaseLease: " +
          "Failed to release lease for file " + src +
          ". Committed blocks are waiting to be minimally replicated " +
          blockStateStr + ". Try again later.";
      NameNode.stateChangeLog.warn(message);
      throw new AlreadyBeingCreatedException(message);
    case UNDER_CONSTRUCTION:
{code}

That is, add the {{getBlockManager().forceCompleteBlock}} for the neede blocks 
before calling
{code}
        finalizeINodeFileUnderConstruction(src, pendingFile,
            iip.getLatestSnapshotId());
{code}

However, there is a problem because of the way your test is written, the 
{{BlockInfoUCDesired}} in your test is not allowed to be converted to 
{{(BlockInfoUnderConstruction}}. I was looking a bit more into fixing that.

Make sense to you? If you agree, maybe you can look further into this direction?

BTW, notice I had the block state added to the messsage printing, which I think 
is important. Also I added comments in between the two {{case}} statements.

Thanks again!


> Lease Recovery doesn't happen some times
> ----------------------------------------
>
>                 Key: HDFS-7342
>                 URL: https://issues.apache.org/jira/browse/HDFS-7342
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-7342.1.patch, HDFS-7342.2.patch
>
>
> In some cases, LeaseManager tries to recover a lease, but is not able to. 
> HDFS-4882 describes a possibility of that. We should fix this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to