[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825970#comment-13825970
 ] 

Jing Zhao commented on HDFS-5428:
---------------------------------

Thanks for the review, Nicholas!

bq. In loadINode(..), we should also check if lastBlk.getNumBytes() < 
blockSize. If it is a full block, we should not convert it to 
BlockInfoUnderConstruction.

I also thought about it. So here is it possible that the client just writes a 
full block and call the sync(update-length), but has not tried to get an 
additional block or close the file? In that case, we will have a full-size 
block which is under construction? If we treat this block as a complete block 
and write it into fsimage, later when the NN restarts and receives a block 
report from DN, we may miss the special process added in HDFS-5283 in 
BlockManager#processFirstBlockReport. And this may cause NN to stay in 
SafeMode. 


> under construction files deletion after snapshot+checkpoint+nn restart leads 
> nn safemode
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-5428
>                 URL: https://issues.apache.org/jira/browse/HDFS-5428
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
> HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, HDFS-5428.patch
>
>
> 1. allow snapshots under dir /foo
> 2. create a file /foo/test/bar and start writing to it
> 3. create a snapshot s1 under /foo after block is allocated and some data has 
> been written to it
> 4. Delete the directory /foo/test
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> NN enters to safemode.
> Analysis:
> Snapshot nodes loaded from fsimage are always complete and all blocks will be 
> in COMPLETE state. 
> So when the Datanode reports RBW blocks those will not be updated in 
> blocksmap.
> Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to