[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825970#comment-13825970 ]
Jing Zhao commented on HDFS-5428: --------------------------------- Thanks for the review, Nicholas! bq. In loadINode(..), we should also check if lastBlk.getNumBytes() < blockSize. If it is a full block, we should not convert it to BlockInfoUnderConstruction. I also thought about it. So here is it possible that the client just writes a full block and call the sync(update-length), but has not tried to get an additional block or close the file? In that case, we will have a full-size block which is under construction? If we treat this block as a complete block and write it into fsimage, later when the NN restarts and receives a block report from DN, we may miss the special process added in HDFS-5283 in BlockManager#processFirstBlockReport. And this may cause NN to stay in SafeMode. > under construction files deletion after snapshot+checkpoint+nn restart leads > nn safemode > ---------------------------------------------------------------------------------------- > > Key: HDFS-5428 > URL: https://issues.apache.org/jira/browse/HDFS-5428 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0, 2.2.0 > Reporter: Vinay > Assignee: Vinay > Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, > HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, HDFS-5428.patch > > > 1. allow snapshots under dir /foo > 2. create a file /foo/test/bar and start writing to it > 3. create a snapshot s1 under /foo after block is allocated and some data has > been written to it > 4. Delete the directory /foo/test > 5. wait till checkpoint or do saveNameSpace > 6. restart NN. > NN enters to safemode. > Analysis: > Snapshot nodes loaded from fsimage are always complete and all blocks will be > in COMPLETE state. > So when the Datanode reports RBW blocks those will not be updated in > blocksmap. > Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1#6144)