[
https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HDFS-5428:
----------------------------
Attachment: HDFS-5428.000.patch
Continue the discussion in HDFS-5443 here..
So HDFS-5428.000.patch is a simple patch that implements similar idea mentioned
in HDFS-5443:
1) Record extra information in fsimage to indicate INodeFileUC that are only in
snapshots. To keep the compatibility we keep the information in the
"under-construction-files" section in fsimage, and just use ".snapshot" as
their paths.
2) Identify these snapshot files while loading fsimage, and temporarily store
them in a map in SnapshotManager.
3) When calculating total block number when starting NN, besides the files
recorded in the lease map, also deduct the number of files recorded in 2).
In general the idea is very similar to Vinay's patch. The difference is that we
do not keep and maintain records in the lease map and only handle these files
when starting the NN. We can even clear the records in SnapshotManager after
computing the total number of blocks.
One more thing we may need to handle is that if we remove the 0-sized blocks
(HDFS-5443), it is possible that we can have an under-construction file in
snapshot while there is no corresponding blockUC for the file. In that case we
should not record extra information in fsimage for this kind of INodeFileUC.
The current patch is just for demonstration. It can pass the new unit tests in
Vinay's patch. If folks think the general idea is ok, we can continue our work
based on this patch.
> under construction files deletion after snapshot+checkpoint+nn restart leads
> nn safemode
> ----------------------------------------------------------------------------------------
>
> Key: HDFS-5428
> URL: https://issues.apache.org/jira/browse/HDFS-5428
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Vinay
> Assignee: Vinay
> Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.patch
>
>
> 1. allow snapshots under dir /foo
> 2. create a file /foo/test/bar and start writing to it
> 3. create a snapshot s1 under /foo after block is allocated and some data has
> been written to it
> 4. Delete the directory /foo/test
> 5. wait till checkpoint or do saveNameSpace
> 6. restart NN.
> NN enters to safemode.
> Analysis:
> Snapshot nodes loaded from fsimage are always complete and all blocks will be
> in COMPLETE state.
> So when the Datanode reports RBW blocks those will not be updated in
> blocksmap.
> Some of the FINALIZED blocks will be marked as corrupt due to length mismatch.
--
This message was sent by Atlassian JIRA
(v6.1#6144)