[
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208168#comment-13208168
]
Todd Lipcon commented on HDFS-2949:
-----------------------------------
I think we should probably un-document the transitionTo* commands, but leave
them as a safety valve. It's nice to have direct access to these RPCs just in
case there's some problem with one of the safer methods and you need a
workaround without recompiling the client.
That said, having the safety check described in this JIRA is still valuable,
even using haadmin -failover, in case the admin has a messed up configuration
in some way (eg the fencing script returns true but did not in fact fence the
standby correctly)
> HA: Add check to active state transition to prevent operator-induced split
> brain
> --------------------------------------------------------------------------------
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Todd Lipcon
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one
> NN while the other one is still active, all hell will break loose. We can add
> a simple check by having the NN make a getServiceState() RPC to its peer with
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other
> node is active, it should refuse to enter active mode. If the RPC fails or
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to
> use the "-failover" command which has other checks plus fencing built in.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira