Yongjun Zhang created HDFS-6825:
-----------------------------------
Summary: Edit log corruption due to delayed block removal
Key: HDFS-6825
URL: https://issues.apache.org/jira/browse/HDFS-6825
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Observed the following stack:
{code}
2014-08-04 23:49:44,133 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=...,
newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
2014-08-04 23:49:44,133 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception while
updating disk space.
java.io.FileNotFoundException: Path not found:
/solr/hierarchy/core_node1/data/tlog/tlog.xyz
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
at
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
at
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
{code}
Found this is what happened:
- client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
- client tried to append to this file, but the lease expired, so lease recovery
is started, thus the append failed
- the file get deleted, however, there are still pending blocks of this file
not deleted
- then commitBlockSynchronization() method is called (see stack above), an
InodeFile is created out of the pending block, not aware of that the file was
deleted already
- FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but
swallowed by commitOrCompleteLastBlock
- closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction and
wrote CloseOp to the edit log
--
This message was sent by Atlassian JIRA
(v6.2#6252)