[ 
https://issues.apache.org/jira/browse/HDFS-13813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-13813:
------------------------------
    Description: 
Recently, the same stack trace as in -HDFS-9406- appears again in the field. 
The symptom of the problem is that *loadINodeDirectorySection()* can't find a 
child inode in inodeMap by the node id in the children list of the directory. 
The child inode could be missing or deleted.

As for now we didn't have a clear trace to reproduce the problem. Therefore, 
I'm proposing this improvement to detect such corruption (data structure 
inconsistency) when saving the FsImage, so that we can have the FsImage and 
Edit Log to hopefully reproduce the problem stably.

 

In a previous patch HDFS-13314, [~arpitagarwal] did a great job catching 
potential FsImage corruption in two cases. This patch includes a third case 
where a child inode does not exist in the global FSDirectory dir when saving 
(serializing) INodeDirectorySection.

  was:
Recently, the same stack trace as in -HDFS-9406- appears again in the field. 
The symptom of the problem is that *loadINodeDirectorySection()* can't find a 
child inode in inodeMap by the node id in the children list of the directory. 
The child inode could be missing or deleted.

As for now we didn't have a clear trace to reproduce the problem. Therefore, 
I'm proposing this improvement to detect such corruption (data structure 
inconsistency) when saving the FsImage, so that we can have the FsImage and 
Edit Log to potentially stably reproduce the problem.

 

In a previous patch HDFS-13314, [~arpitagarwal] did a great job catching 
potential FsImage corruption in two cases. Further, this patch would detect if 
a child inode exist in the global FSDirectory dir when saving (serializing) 
INodeDirectorySection.


> Exit NameNode when dangling child inode is detected when saving FsImage
> -----------------------------------------------------------------------
>
>                 Key: HDFS-13813
>                 URL: https://issues.apache.org/jira/browse/HDFS-13813
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.0.3
>            Reporter: Siyao Meng
>            Assignee: Siyao Meng
>            Priority: Major
>         Attachments: HDFS-13813.001.patch
>
>
> Recently, the same stack trace as in -HDFS-9406- appears again in the field. 
> The symptom of the problem is that *loadINodeDirectorySection()* can't find a 
> child inode in inodeMap by the node id in the children list of the directory. 
> The child inode could be missing or deleted.
> As for now we didn't have a clear trace to reproduce the problem. Therefore, 
> I'm proposing this improvement to detect such corruption (data structure 
> inconsistency) when saving the FsImage, so that we can have the FsImage and 
> Edit Log to hopefully reproduce the problem stably.
>  
> In a previous patch HDFS-13314, [~arpitagarwal] did a great job catching 
> potential FsImage corruption in two cases. This patch includes a third case 
> where a child inode does not exist in the global FSDirectory dir when saving 
> (serializing) INodeDirectorySection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to