Yongjun Zhang created HDFS-6825:
-----------------------------------

             Summary: Edit log corruption due to delayed block removal
                 Key: HDFS-6825
                 URL: https://issues.apache.org/jira/browse/HDFS-6825
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.5.0
            Reporter: Yongjun Zhang
            Assignee: Yongjun Zhang


Observed the following stack:
{code}
2014-08-04 23:49:44,133 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
2014-08-04 23:49:44,133 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while 
updating disk space. 
java.io.FileNotFoundException: Path not found: 
/solr/hierarchy/core_node1/data/tlog/tlog.xyz
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
        at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
        at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
{code}

Found this is what happened:

- client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
- client tried to append to this file, but the lease expired, so lease recovery 
is started, thus the append failed
- the file get deleted, however, there are still pending blocks of this file 
not deleted
- then commitBlockSynchronization() method is called (see stack above), an 
InodeFile is created out of the pending block, not aware of that the file was 
deleted already
- FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
swallowed by commitOrCompleteLastBlock
- closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and 
wrote CloseOp to the edit log




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to