[
https://issues.apache.org/jira/browse/HADOOP-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741771#comment-13741771
]
Sanjay Radia commented on HADOOP-9880:
--------------------------------------
We see exactly the same error during a test this morning.
The 2 Jiras that caused this problem are the recent HADOOP-9421 and the
earlier HDFS-3083.
HADOOP-9421 improved SASL protocol.
ZKFC uses Kerberos. But the server-side initiates the token-based challenge
just in case the client wants token. As part of doing that the server does
secretManager.checkAvailableForRead() fails because the NN is in standby.
It is really bizzare that there is check for the server's state (active or
standby) as part of SASL. This was introduced in HDFS-3083 to deal with a
failover bug. In HDFS-3083, Aaron noted that he does not like the solution:
"I'm not in love with this solution, as it leaks abstractions all over the
place,". The abstraction layer violation finally caught up with us.
Turns out even prior to Dary's HADOOP-9421 a similar problem could have
occurred if the ZKFC had used Kerberos for first connection and Tokens for any
subsequent connections.
An immediate fix is required to fix what HADOOP-9421 broke but I believe we
need to also fix the fix that HDFS-3083 introduced - the abstraction layer
violations need to be cleaned up.
> RPC Server should not unconditionally create SaslServer with Token auth.
> ------------------------------------------------------------------------
>
> Key: HADOOP-9880
> URL: https://issues.apache.org/jira/browse/HADOOP-9880
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.1.0-beta
> Reporter: Kihwal Lee
> Priority: Blocker
>
> buildSaslNegotiateResponse() will create a SaslRpcServer with TOKEN auth.
> When create() is called against it, secretManager.checkAvailableForRead() is
> called, which fails in HA standby. Thus HA standby nodes cannot be
> transitioned to active.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira