[
https://issues.apache.org/jira/browse/HADOOP-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250294#comment-13250294
]
Todd Lipcon commented on HADOOP-8247:
-------------------------------------
I also ran the manual tests again. Here's the usage output of HAAdmin:
{code}
Usage: DFSHAAdmin [-ns <nameserviceId>]
[-transitionToActive [--forcemanual] <serviceId>]
[-transitionToStandby [--forcemanual] <serviceId>]
[-failover [--forcefence] [--forceactive] [--forcemanual] <serviceId>
<serviceId>]
[-getServiceState <serviceId>]
[-checkHealth <serviceId>]
[-help <command>]
--forceManual allows the manual failover commands to be used
even when automatic failover is enabled. This
flag is DANGEROUS and should only be used with
expert guidance.
{code}
Here's what happens if I try to use a state change command with auto-HA enabled:
{code}
$ ./bin/hdfs haadmin -transitionToActive nn1
Automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the forcemanual flag.
$ echo $?
255
{code}
Also checked the other two state-changing ops (transitionToStandby and
failover) and they yielded the same error message.
- I verified that {{-getServiceState}} and {{-checkHealth}} continue to work.
- I verified that the -forceManual flag worked:
{code}
$ ./bin/hdfs haadmin -transitionToStandby -forcemanual nn1
12/04/09 16:12:38 WARN ha.HAAdmin: Proceeding with manual HA state management
even though
automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021
{code}
(also for -transitionToActive and -failover)
- Verified that {{start-dfs.sh}} starts the ZKFCs on both of my configured NNs
when auto-HA is enabled. Also verified {{stop-dfs.sh}} stops the ZKFCs.
Discovered trivial bug HDFS-3234 here.
----
Next, I modified my config to set the auto failover flag to false.
- verified that start-dfs.sh doesn't try to start ZKFCs.
- verified that if I try to start a ZKFC, it bails:
{code}
12/04/09 16:19:12 INFO tools.DFSZKFailoverController: Failover controller
configured for NameNode nameserviceId1.nn2
12/04/09 16:19:12 FATAL ha.ZKFailoverController: Automatic failover is not
enabled for NameNode at todd-w510/127.0.0.1:8022. Please ensure that automatic
failover is enabled in the configuration before running the ZK failover
controller.
{code}
- verified that the haadmin commands all function without any {{-forcemanual}}
flag specified.
> Auto-HA: add a config to enable auto-HA, which disables manual FC
> -----------------------------------------------------------------
>
> Key: HADOOP-8247
> URL: https://issues.apache.org/jira/browse/HADOOP-8247
> Project: Hadoop Common
> Issue Type: Improvement
> Components: auto-failover, ha
> Affects Versions: Auto Failover (HDFS-3042)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hadoop-8247.txt, hadoop-8247.txt, hadoop-8247.txt,
> hadoop-8247.txt
>
>
> Currently, if automatic failover is set up and running, and the user uses the
> "haadmin -failover" command, he or she can end up putting the system in an
> inconsistent state, where the state in ZK disagrees with the actual state of
> the world. To fix this, we should add a config flag which is used to enable
> auto-HA. When this flag is set, we should disallow use of the haadmin command
> to initiate failovers. We should refuse to run ZKFCs when the flag is not
> set. Of course, this flag should be scoped by nameservice.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira