[ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815775#comment-13815775
 ] 

Jing Zhao commented on HDFS-5428:
---------------------------------

bq. if the same INode is referring to a completed file [ might be due to rename 
and leaserecovery ] in normal path 

We will replace the whole Inode if it is in normal path. We only replace its 
last block if the file is only in snapshot. But next time when we do the 
checkpoint again, we may need to check a file's last block to decide whether 
it's a fileUC.

Another option here is that we replace the inode for all the cases. To cover 
the challenge that we cannot get the full snapshot path, we can use the inode 
id to get the inode first, then scan the diff list of its parent to do the 
replacement. This will be inefficient but might be ok in case that we do not 
have a lot of snapshots and inodeUC.

bq. Now while writing the inode tree to fsimage, inode in s2 comes first and 
then s1 , then only INode in s1 will be marked as underconstruction. but actual 
underconstruction is INode in S2 snapshot

For rename, we will only have one INode here, which is referenced by two 
INodeReference instances stored in s1 and s2. And since we only record inode id 
in snapshotUCMap, this scenario might be fine?

> under construction files deletion after snapshot+checkpoint+nn restart leads 
> nn safemode
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-5428
>                 URL: https://issues.apache.org/jira/browse/HDFS-5428
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, 
> HDFS-5428.001.patch, HDFS-5428.patch
>
>
> 1. allow snapshots under dir /foo
> 2. create a file /foo/test/bar and start writing to it
> 3. create a snapshot s1 under /foo after block is allocated and some data has 
> been written to it
> 4. Delete the directory /foo/test
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> NN enters to safemode.
> Analysis:
> Snapshot nodes loaded from fsimage are always complete and all blocks will be 
> in COMPLETE state. 
> So when the Datanode reports RBW blocks those will not be updated in 
> blocksmap.
> Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to