[
https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321801#comment-14321801
]
Hadoop QA commented on HDFS-7798:
---------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12698944/HDFS-7798.01.patch
against trunk revision 3338f6d.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:red}-1 eclipse:eclipse{color}. The patch failed to build with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in .
Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/9587//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9587//console
This message is automatically generated.
> Checkpointing failure caused by shared KerberosAuthenticator
> ------------------------------------------------------------
>
> Key: HDFS-7798
> URL: https://issues.apache.org/jira/browse/HDFS-7798
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: security
> Reporter: Chengbing Liu
> Assignee: Chengbing Liu
> Priority: Critical
> Attachments: HDFS-7798.01.patch
>
>
> We have observed in our real cluster occasional checkpointing failure. The
> standby NameNode was not able to upload image to the active NameNode.
> After some digging, the root cause appears to be a shared
> {{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is
> designed as a use-once instance, and is not stateless. It has attributes such
> as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling
> {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is
> going to have race condition, resulting in a failed image uploading.
> Therefore for the first step, without breaking the current API, I propose we
> create a new {{KerberosAuthenticator}} instance for each connection, to make
> checkpointing work. We may consider making {{Authenticator}} design and
> implementation stateless afterwards, as {{ConnectionConfigurator}} does.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)