[ 
https://issues.apache.org/jira/browse/HDFS-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098682#comment-14098682
 ] 

Yongjun Zhang commented on HDFS-6825:
-------------------------------------

HI [~kihwal],

Thanks a lot for the review, we were doing the last update at the same time so 
I just saw your review comments.

The change is {{isFileDeleted}} is to handle recursive deletion. If we remove 
the change in this method, we can see the test I added fail.  Say, for a path 
"/a/b/c/file", if we do {{fs.delete("/a/b", true)}}, what I observed is 
different than what you stated: it only removes "b" from a's children when 
holding the write lock (and delayed other removal to later), thus the 
{{isFileDeleted}} returned false on "/a/b/c/file".

I just rerun to collect a log for your reference. This exception happens when 
the test restart NN to see if the editlog is corrupted or not. With the fix I 
introduced in {{isFileDeleted}}, it solves this problem:
{code}
Running org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.297 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
testDeleteAndCommitBlockSynchronizationRaceHasSnapshot(org.apache.hadoop.hdfs.server.namenode.TestDeleteRace)
  Time elapsed: 7.101 sec  <<< ERROR!
java.io.FileNotFoundException: File does not exist: /testdir/testdir1/test-file
        at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:65)
        at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:55)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:412)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:227)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:136)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:820)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:678)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:972)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:715)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:533)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:589)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:756)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:740)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1425)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:1696)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNodes(MiniDFSCluster.java:1651)
        at 
org.apache.hadoop.hdfs.server.namenode.TestDeleteRace.testDeleteAndCommitBlockSynchronizationRace(TestDeleteRace.java:317)
        at 
org.apache.hadoop.hdfs.server.namenode.TestDeleteRace.testDeleteAndCommitBlockSynchronizationRaceHasSnapshot(TestDeleteRace.java:338)
{code}

Thanks.


> Edit log corruption due to delayed block removal
> ------------------------------------------------
>
>                 Key: HDFS-6825
>                 URL: https://issues.apache.org/jira/browse/HDFS-6825
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.5.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6825.001.patch, HDFS-6825.002.patch, 
> HDFS-6825.003.patch, HDFS-6825.004.patch, HDFS-6825.005.patch
>
>
> Observed the following stack:
> {code}
> 2014-08-04 23:49:44,133 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> commitBlockSynchronization(lastblock=BP-.., newgenerationstamp=..., 
> newlength=..., newtargets=..., closeFile=true, deleteBlock=false)
> 2014-08-04 23:49:44,133 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unexpected exception 
> while updating disk space. 
> java.io.FileNotFoundException: Path not found: 
> /solr/hierarchy/core_node1/data/tlog/tlog.xyz
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateSpaceConsumed(FSDirectory.java:1807)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3975)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.closeFileCommitBlocks(FSNamesystem.java:4178)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:4146)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.commitBlockSynchronization(NameNodeRpcServer.java:662)
>         at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.commitBlockSynchronization(DatanodeProtocolServerSideTranslatorPB.java:270)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28073)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> Found this is what happened:
> - client created file /solr/hierarchy/core_node1/data/tlog/tlog.xyz
> - client tried to append to this file, but the lease expired, so lease 
> recovery is started, thus the append failed
> - the file get deleted, however, there are still pending blocks of this file 
> not deleted
> - then commitBlockSynchronization() method is called (see stack above), an 
> InodeFile is created out of the pending block, not aware of that the file was 
> deleted already
> - FileNotExistException was thrown by FSDirectory.updateSpaceConsumed, but 
> swallowed by commitOrCompleteLastBlock
> - closeFileCommitBlocks continue to call finalizeINodeFileUnderConstruction 
> and wrote CloseOp to the edit log



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to