[
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098657#comment-14098657
]
Kihwal Lee commented on HDFS-6825:
----------------------------------
> Could we also check that this works with a recursive delete on the containing
> folder of the open file?
I assume the change in {{isFileDeleted()}} is for this. I believe the
recursive check is not necessary. When a tree is deleted, everything under it
is recursively processed while holding FSNamesystem and FSDirectory write lock.
If it does not belong to any snapshot, its parent and block field will be
cleared. If in a snapshot, it will be marked as deleted. The only thing that
is not cleared while in the lock and causing this issue is the block collection
field of BlockInfo. So {{isFileDeleted()}} does not need to walk up the tree.
The rest of the patch looks good.
> Edit log corruption due to delayed block removal
> ------------------------------------------------
>
> Key: HDFS-6825
> URL: https://issues.apache.org/jira/browse/HDFS-6825
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.5.0
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch,
> HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch
>
>
> Observed the following stack:
> {code}
> 2014-08-04 23:49:44,133 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=...,
> newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
> 2014-08-04 23:49:44,133 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception
> while updating disk space.
> java.io.FileNotFoundException: Path not found:
> /solr/hierarchy/core_node1/data/tlog/tlog.xyz
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
> at
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
> at
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> Found this is what happened:
> - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
> - client tried to append to this file, but the lease expired, so lease
> recovery is started, thus the append failed
> - the file get deleted, however, there are still pending blocks of this file
> not deleted
> - then commitBlockSynchronization() method is called (see stack above), an
> InodeFile is created out of the pending block, not aware of that the file was
> deleted already
> - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but
> swallowed by commitOrCompleteLastBlock
> - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction
> and wrote CloseOp to the edit log
--
This message was sent by Atlassian JIRA
(v6.2#6252)