[ 
https://issues.apache.org/jira/browse/HDFS-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405475#comment-16405475
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13314:
--------------------------------------------

Thanks [~arpitagarwal], some comments on the patch:
- Print also the fsimage file name in the log messages below.
{code:java}
+      LOG.error("Detected " + numErrors + " errors while saving FsImage.");
{code}
{code:java}
+      LOG.fatal("NameNode process will exit now... The saved FsImage is " +
+          "potentially corrupted.");
{code}

 - Add numErrors in the log message below.
{code:java}
+        long numErrors = saveInternal(fout, compression, 
file.getAbsolutePath());
         LOG.info("Image file {} of size {} bytes saved in {} seconds.", file,
             file.length(), (monotonicNow() - startTime) / 1000);
+        return numErrors;
{code}

 - Print the full path in the log message below
{code:java}
+        FSImage.LOG.error("FSImageFormatPBSnapshot: Missing referred INodeId " 
+
+            ref.getId() + " for INodeReference index " + refIndex);
{code}

 - Let's not only check INodeReference but all INodes. Also, let's use 
compareTo to detect also out-of-order cases.
{code:java}
          INode previous = null;
          for (INode d : deleted) {
            if (previous != null) {
              final int cmp = d.compareTo(previous.getLocalNameBytes());
              if (cmp <= 0) {
                final String err = cmp == 0? "repeated": "out-of-order";
                FSImage.LOG.error("Names " + err + " in the 'deleted' difflist 
of directory " ...);
                ++numImageErrors;
              }
            }
            previous = d;
{code}

> NameNode should optionally exit if it detects FsImage corruption
> ----------------------------------------------------------------
>
>                 Key: HDFS-13314
>                 URL: https://issues.apache.org/jira/browse/HDFS-13314
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>            Priority: Major
>         Attachments: HDFS-13314.01.patch, HDFS-13314.02.patch
>
>
> The NameNode should optionally exit after writing an FsImage if it detects 
> the following kinds of corruptions:
> # INodeReference pointing to non-existent INode
> # Duplicate entries in snapshot deleted diff list.
> This behavior is controlled via an undocumented configuration setting, and 
> disabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to