[jira] [Updated] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

Jing Zhao (JIRA) Wed, 06 Nov 2013 15:52:35 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jing Zhao updated HDFS-5428:
----------------------------

    Attachment: HDFS-5428.000.patch

Continue the discussion in HDFS-5443 here..

So HDFS-5428.000.patch is a simple patch that implements similar idea mentioned 
in HDFS-5443: 
1) Record extra information in fsimage to indicate INodeFileUC that are only in 
snapshots. To keep the compatibility we keep the information in the 
"under-construction-files" section in fsimage, and just use ".snapshot" as 
their paths.
2) Identify these snapshot files while loading fsimage, and temporarily store 
them in a map in SnapshotManager.
3) When calculating total block number when starting NN, besides the files 
recorded in the lease map, also deduct the number of files recorded in 2).

In general the idea is very similar to Vinay's patch. The difference is that we 
do not keep and maintain records in the lease map and only handle these files 
when starting the NN. We can even clear the records in SnapshotManager after 
computing the total number of blocks.

One more thing we may need to handle is that if we remove the 0-sized blocks 
(HDFS-5443), it is possible that we can have an under-construction file in 
snapshot while there is no corresponding blockUC for the file. In that case we 
should not record extra information in fsimage for this kind of INodeFileUC. 

The current patch is just for demonstration. It can pass the new unit tests in 
Vinay's patch. If folks think the general idea is ok, we can continue our work 
based on this patch.


> under construction files deletion after snapshot+checkpoint+nn restart leads 
> nn safemode
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-5428
>                 URL: https://issues.apache.org/jira/browse/HDFS-5428
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch
>
>
> 1. allow snapshots under dir /foo
> 2. create a file /foo/test/bar and start writing to it
> 3. create a snapshot s1 under /foo after block is allocated and some data has 
> been written to it
> 4. Delete the directory /foo/test
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> NN enters to safemode.
> Analysis:
> Snapshot nodes loaded from fsimage are always complete and all blocks will be 
> in COMPLETE state. 
> So when the Datanode reports RBW blocks those will not be updated in 
> blocksmap.
> Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode

Reply via email to