[ 
https://issues.apache.org/jira/browse/HDFS-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420978#comment-16420978
 ] 

Yongjun Zhang commented on HDFS-13314:
--------------------------------------

{quote}

Hi Yongjun, thanks for looking at the Jira! Please post your comments in the 
Jira also for support. 

 
 # Yes we saw duplicate entries.
 # The crash we saw was a NPE due to the referred INode being absent. The check 
looks for such dangling references. I don’t think we have seen a crash at the 
location you pointed out.

    private INodeReference loadINodeReference(

        INodeReferenceSection.INodeReference r) throws IOException {

      long referredId = r.getReferredId();

      INode referred = fsDir.getInode(referredId);

      *WithCount withCount = (WithCount) referred.getParentReference();       
<<<<<< Crashes here as referred is null.*
 # We have not seen misordered entries yet. Also, the *!misordered* check was 
deliberate. Once there is one such entry the whole list is compromised.
 # The Assertion actually results in a runtime exception which fails the 
request. However we suspect that the list was somehow corrupted by other means, 
not the insert call. We are not sure how it happened.

 

Let me know if you have any concerns or ideas for improving the checks. We can 
certainly do a follow up jira.

{quote}

> NameNode should optionally exit if it detects FsImage corruption
> ----------------------------------------------------------------
>
>                 Key: HDFS-13314
>                 URL: https://issues.apache.org/jira/browse/HDFS-13314
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>            Priority: Major
>             Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2
>
>         Attachments: HDFS-13314.01.patch, HDFS-13314.02.patch, 
> HDFS-13314.03.patch, HDFS-13314.04.patch, HDFS-13314.05.patch
>
>
> The NameNode should optionally exit after writing an FsImage if it detects 
> the following kinds of corruptions:
> # INodeReference pointing to non-existent INode
> # Duplicate entries in snapshot deleted diff list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to