[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897982#comment-16897982
 ] 

Shashikant Banerjee commented on HDFS-13101:
--------------------------------------------

Thanks [~jojochuang] for simplifying the test case. The bug most probably lies 
here in this code path
{code:java}
DirectoryWithSnapshotFeature#cleanDirectory():

// check priorDiff again since it may be created during the diff deletion
if (prior != NO_SNAPSHOT_ID) {
  DirectoryDiff priorDiff = this.getDiffs().getDiffById(prior);
  if (priorDiff != null && priorDiff.getSnapshotId() == prior) {
    // For files/directories created between "prior" and "snapshot", 
    // we need to clear snapshot copies for "snapshot". Note that we must
    // use null as prior in the cleanSubtree call. Files/directories that
    // were created before "prior" will be covered by the later 
    // cleanSubtreeRecursively call.
    if (priorCreated != null) {
      // we only check the node originally in prior's created list
      for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) {
        if (priorCreated.containsKey(cNode)) {
          cNode.cleanSubtree(reclaimContext, snapshot, NO_SNAPSHOT_ID);
        }
      }
    }
{code}
Any entry created under the directory after the prior snapshot(S0 in the prev 
example), the tree will be traversed and only the inodes which have a diff 
record associated with snaphot to be deleted will be cleaned in case there is 
no other reference left (these don't exist in the active fs anymore) as a part 
of deleting the snapshot diff. But, the corresponding entries won't be removed 
from the child list of the parent. This leads to dangling references to the 
child inode in the directory . All the descendents of dir ("dirb" in the above 
exmaple) which don't have any diff associated with the snapshot to be 
deleted(s1 in above example) will be left behind even though they have been 
deleted from the active fs even after the snapshot deletion.

 

> Yet another fsimage corruption related to snapshot
> --------------------------------------------------
>
>                 Key: HDFS-13101
>                 URL: https://issues.apache.org/jira/browse/HDFS-13101
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Siyao Meng
>            Priority: Major
>         Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to