[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279468#comment-13279468
 ] 

amith commented on HDFS-2994:
-----------------------------

When there is lease recovery is in progress along with the append call on the 
same file then I have seen this problem coming.

Currently FSDirectory.replaceNode() is called from 2 methods
FSNameSystem#finalizeINodeFileUnderConstruction()
FSNameSystem#prepareFileForWrite() 

from this method we call to change the entry Inode entry in NN metadata (INode 
Structure, from InodeFile->InodeFileUnderConstruction ...)

If we observe the change constructor used in this methods

{code}
public LocatedBlock prepareFileForWrite(String src, INode file,
      String leaseHolder, String clientMachine, DatanodeDescriptor clientNode,
      boolean writeToEditLog)
      throws UnresolvedLinkException, IOException {
    INodeFile node = (INodeFile) file;
    INodeFileUnderConstruction cons = new INodeFileUnderConstruction(
                                    node.getLocalNameBytes(),
                                    node.getReplication(),
                                    node.getModificationTime(),
                                    node.getPreferredBlockSize(),
                                    node.getBlocks(),
                                    node.getPermissionStatus(),
                                    leaseHolder,
                                    clientMachine,
                                    clientNode);
    dir.replaceNode(src, node, cons);
    leaseManager.addLease(cons.getClientName(), src);
    
    LocatedBlock ret = blockManager.convertLastBlockToUnderConstruction(cons);
    if (writeToEditLog) {
      getEditLog().logOpenFile(src, cons);
    }
    return ret;
  }
{code}
INodeFileUnderConstruction constructor fails to capture INode.parent attribute 
causing the cons to have a null entry instead of parent !!!
Similarly 

{code}
private void finalizeINodeFileUnderConstruction(String src, 
      INodeFileUnderConstruction pendingFile) 
      throws IOException, UnresolvedLinkException {
    assert hasWriteLock();
    leaseManager.removeLease(pendingFile.getClientName(), src);

    // The file is no longer pending.
    // Create permanent INode, update blocks
    INodeFile newFile = pendingFile.convertToInodeFile();
    dir.replaceNode(src, pendingFile, newFile);

    // close file and persist block allocations for this file
    dir.closeFile(src, newFile);

    checkReplicationFactor(newFile);
  }
{code} pendingFile.convertToInodeFile(); also looses the parent attribute 
causing null entry in parent's location.

Similarly I have modified the

{code}
boolean removeNode() {
    if (parent == null) {
      return false;
    } else {
      parent.removeChild(this);
-     parent=null;
      return true;
    }
  } 
{code}
since in 
{code}
      INode myFile = dir.getFileINode(src);
      recoverLeaseInternal(myFile, src, holder, clientMachine, false);
{code}

in recoverLeaseInternal myFile loose the parent attribute.

A test as been added to verify the same behaviour, in which I am creating 3 
clients to with different 
{code}
mapreduce.task.attempt.id
{code}

so that we can have different holder for the clients so lease recovery to get 
triggered when accessed by other client.
 
                
> If lease is recovered successfully inline with create, create can fail
> ----------------------------------------------------------------------
>
>                 Key: HDFS-2994
>                 URL: https://issues.apache.org/jira/browse/HDFS-2994
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: amith
>         Attachments: HDFS-2994_1.patch
>
>
> I saw the following logs on my test cluster:
> {code}
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
> [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
> DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> closed.
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> {code}
> It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
> then the INode will be replaced with a new one, meaning the later 
> {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to