[
https://issues.apache.org/jira/browse/HDFS-6353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274738#comment-14274738
]
Hadoop QA commented on HDFS-6353:
---------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12691823/HDFS-6353.001.patch
against trunk revision 5188153.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 14 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:red}-1 findbugs{color}. The patch appears to introduce 3 new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-hdfs-project/hadoop-hdfs
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.
Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/9195//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/9195//artifact/patchprocess/newPatchFindbugsWarningsbkjournal.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9195//console
This message is automatically generated.
> Handle checkpoint failure more gracefully
> -----------------------------------------
>
> Key: HDFS-6353
> URL: https://issues.apache.org/jira/browse/HDFS-6353
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Reporter: Suresh Srinivas
> Assignee: Jing Zhao
> Attachments: HDFS-6353.000.patch, HDFS-6353.001.patch
>
>
> One of the failure patterns I have seen is, in some rare circumstances, due
> to some inconsistency the secondary or standby fails to consume editlog. The
> only solution when this happens is to save the namespace at the current
> active namenode. But sometimes when this happens, unsuspecting admin might
> end up restarting the namenode, requiring more complicated solution to the
> problem (such as ignore editlog record that cannot be consumed etc.).
> How about adding the following functionality:
> When checkpointer (standby or secondary) fails to consume editlog, based on a
> configurable flag (on/off) to let the active namenode know about this
> failure. Active namenode can enters safemode and saves namespace. When in
> this type of safemode, namenode UI also shows information about checkpoint
> failure and that it is saving namespace. Once the namespace is saved,
> namenode can come out of safemode.
> This means service unavailability (even in HA cluster). But it might be worth
> it to avoid long startup times or need for other manual fixes. Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)