Yongjun Zhang created HDFS-6825: ----------------------------------- Summary: Edit log corruption due to delayed block removal Key: HDFS-6825 URL: https://issues.apache.org/jira/browse/HDFS-6825 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang
Observed the following stack: {code} 2014-08-04 23:49:44,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., newlength=..., newtargets=..., closeFile=true, deleteBlock=false) 2014-08-04 23:49:44,133 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while updating disk space. java.io.FileNotFoundException: Path not found: /solr/hierarchy/core_node1/data/tlog/tlog.xyz at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) {code} Found this is what happened: - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz - client tried to append to this file, but the lease expired, so lease recovery is started, thus the append failed - the file get deleted, however, there are still pending blocks of this file not deleted - then commitBlockSynchronization() method is called (see stack above), an InodeFile is created out of the pending block, not aware of that the file was deleted already - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but swallowed by commitOrCompleteLastBlock - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and wrote CloseOp to the edit log -- This message was sent by Atlassian JIRA (v6.2#6252)