[ 
https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215703#comment-14215703
 ] 

Yongjun Zhang commented on HDFS-4882:
-------------------------------------

HI Jing,

I gave some more thoughts on the block of code I quoted in my last comment, I 
think some additional change is still needed to ensure the lease is recovered 
for the case  when penultimate block is COMMITTED and last block is COMPLETE 
(call it caseX below), in addition to the {{assert false}} I talked about in my 
last comment. Commented below.

Hi Ravi,

I did a review of your latest patch (v4), and have the following comments. Most 
important one is, even though the infinite loop is removed by the patch, the 
lease for caseX is still not released, if the penultimate block stays in 
COMMITTED state. We should release the lease here.

* The change in LeaseManager looks good to me. There is an option not to make 
this change it if we do below.
* For caseX, to make sure the lease is released, we need to do something like 
below
{code}
   boolean penultimateBlockMinReplication = penultimateBlock == null ? true :
        blockManager.checkMinReplication(penultimateBlock);
   BlockUCState penultimateBlockState = penultimateBlock == null ?  
BlockUCState.COMPLETE:
        penultimateBlock.getBlockUCState();
   String blockStateStr;
   ......
    case COMPLETE:
    case COMMITTED:
      blockStateStr =  + "(penultimateBlockState=" + penultimateBlockState + " 
lastBlockState="+lastBlockState + ")";
      // Close file if committed blocks are minimally replicated
      if(penultimateBlockMinReplication &&
          blockManager.checkMinReplication(lastBlock)) {
        finalizeINodeFileUnderConstruction(src, pendingFile,
            iip.getLatestSnapshotId());
        NameNode.stateChangeLog.warn("BLOCK*"
          + " internalReleaseLease: Committed blocks are minimally replicated,"
          + " lease removed, file closed "
          + blockStateStr);
        return true;  // closed!
      }
      // Cannot close file right now, since some blocks 
      // are not yet minimally replicated.
      // This may potentially cause infinite loop in lease recovery
      // if there are no valid replicas on data-nodes.
      String message = "DIR* NameSystem.internalReleaseLease: " +
          "Failed to release lease for file " + src +
          ". Committed blocks are waiting to be minimally replicated " +
          blockStateStr + ", Try again later.";
      NameNode.stateChangeLog.warn(message);
      throw new AlreadyBeingCreatedException(message);
{code}

Basically I suggest to let the two cases to share the same code, and included 
both block's state in the message to distinguish, this handles the different 
scenarios like below.

||  || BlockState || BlockState || BlockState ||BlockState || BlockState || 
BlockState ||
|penultimateBlock | COMPLETE |COMMITTED|COMMITTED|COMPLETE |COMMITTED|COMMITTED|
|lastBlock|COMMITTED|COMMITTED | COMPLETE|COMMITTED|COMMITTED | COMPLETE|
|minReplicaSatisfied|Yes|Yes|Yes|No|No|No|
| 
Solution|CloseFile+ReleaseLease|CloseFile+ReleaseLease|CloseFile+ReleaseLease|ReleaseLease|ReleaseLease|ReleaseLease|
 
Do you and other folks think my proposal makes sense? Thanks a lot.


> Namenode LeaseManager checkLeases() runs into infinite loop
> -----------------------------------------------------------
>
>                 Key: HDFS-4882
>                 URL: https://issues.apache.org/jira/browse/HDFS-4882
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client, namenode
>    Affects Versions: 2.0.0-alpha, 2.5.1
>            Reporter: Zesheng Wu
>            Assignee: Ravi Prakash
>            Priority: Critical
>         Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, 
> HDFS-4882.2.patch, HDFS-4882.3.patch, HDFS-4882.4.patch, HDFS-4882.patch
>
>
> Scenario:
> 1. cluster with 4 DNs
> 2. the size of the file to be written is a little more than one block
> 3. write the first block to 3 DNs, DN1->DN2->DN3
> 4. all the data packets of first block is successfully acked and the client 
> sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out
> 5. DN2 and DN3 are down
> 6. client recovers the pipeline, but no new DN is added to the pipeline 
> because of the current pipeline stage is PIPELINE_CLOSE
> 7. client continuously writes the last block, and try to close the file after 
> written all the data
> 8. NN finds that the penultimate block doesn't has enough replica(our 
> dfs.namenode.replication.min=2), and the client's close runs into indefinite 
> loop(HDFS-2936), and at the same time, NN makes the last block's state to 
> COMPLETE
> 9. shutdown the client
> 10. the file's lease exceeds hard limit
> 11. LeaseManager realizes that and begin to do lease recovery by call 
> fsnamesystem.internalReleaseLease()
> 12. but the last block's state is COMPLETE, and this triggers lease manager's 
> infinite loop and prints massive logs like this:
> {noformat}
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
>  limit
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
> /user/h_wuzesheng/test.dat
> 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block 
> blk_-7028017402720175688_1202597,
> lastBLockState=COMPLETE
> 2013-06-05,17:42:25,695 INFO 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery 
> for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
> APREDUCE_-1252656407_1, pendingcreates: 1]
> {noformat}
> (the 3rd line log is a debug log added by us)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to