[
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825970#comment-13825970
]
Jing Zhao commented on HDFS-5428:
---------------------------------
Thanks for the review, Nicholas!
bq. In loadINode(..), we should also check if lastBlk.getNumBytes() <
blockSize. If it is a full block, we should not convert it to
BlockInfoUnderConstruction.
I also thought about it. So here is it possible that the client just writes a
full block and call the sync(update-length), but has not tried to get an
additional block or close the file? In that case, we will have a full-size
block which is under construction? If we treat this block as a complete block
and write it into fsimage, later when the NN restarts and receives a block
report from DN, we may miss the special process added in HDFS-5283 in
BlockManager#processFirstBlockReport. And this may cause NN to stay in
SafeMode.
> under construction files deletion after snapshot+checkpoint+nn restart leads
> nn safemode
> ----------------------------------------------------------------------------------------
>
> Key: HDFS-5428
> URL: https://issues.apache.org/jira/browse/HDFS-5428
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Vinay
> Assignee: Vinay
> Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch,
> HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, HDFS-5428.patch
>
>
> 1. allow snapshots under dir /foo
> 2. create a file /foo/test/bar and start writing to it
> 3. create a snapshot s1 under /foo after block is allocated and some data has
> been written to it
> 4. Delete the directory /foo/test
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> NN enters to safemode.
> Analysis:
> Snapshot nodes loaded from fsimage are always complete and all blocks will be
> in COMPLETE state.
> So when the Datanode reports RBW blocks those will not be updated in
> blocksmap.
> Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.
--
This message was sent by Atlassian JIRA
(v6.1#6144)