[
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897716#comment-16897716
]
Wei-Chiu Chuang commented on HDFS-13101:
----------------------------------------
Thanks [~shashikant] that is tremendous!! A unit test like this is critical to
the final fix. I was able to further reduced the unit test down.
Namely,
1. Create the following directories
{noformat}
/dir1
/dira
/dirb
/dirx
/dir2
{noformat}
2. create a snapshot s0 at /dir1
3. Add one file
{noformat}
/dir1
/dira
/dirb
/file1
/dirx
/dir2
{noformat}
4. Move /dir1/dira/dirb to /dir1/dirx/dirb
{noformat}
/dir1
/dira
/dirx
/dirb
/file1
/dir2
{noformat}
5. Create a snapshot s1 at /dir1
6. Append to file /dir1/dirx/dirb/file1
7. Create /dir2/dira
{noformat}
/dir1
/dira
/dirx
/dirb
/file1
/dir2
/dira
{noformat}
8. Move /dir1/dirx/dirb to /dir2/dira/dirb
{noformat}
/dir1
/dira
/dirx
/dir2
/dira
/dirb
/file1
{noformat}
9. Delete /dir2/dira/dirb
{noformat}
/dir1
/dira
/dirx
/dir2
/dira
{noformat}
10. Delete snapshot s1
11. Safe fsimage and restart
At the point of detection, there is a INodeReference dirb, pointing to
INodeDirectory dirb.
dirb is in the snapshot s0, which is not deleted.
file1 is not in snapshot s0, so its node got deleted. But dirb’s child list has
file1.
So, the problem is, INodeDirectory fails to remove the child inode from its
child list. dirb is in snapshot s0, but file1 is not, so file1 should be
removed.
The bug is probably somewhere within
\{{DirectoryWithSnapshotFeature#cleanDirectory()}}
> Yet another fsimage corruption related to snapshot
> --------------------------------------------------
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Yongjun Zhang
> Assignee: Siyao Meng
> Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch,
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is
> present, so it's likely another case not covered by the fix. We are currently
> trying to collect good fsimage + editlogs to replay to reproduce it and
> investigate.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]