[jira] [Updated] (HADOOP-8247) Auto-HA: add a config to enable auto-HA, which disables manual FC

Todd Lipcon (Updated) (JIRA) Thu, 05 Apr 2012 21:07:44 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated HADOOP-8247:
--------------------------------

    Attachment: hadoop-8247.txt

Here's a preliminary patch for this issue. It still needs a little 
cleanup/javadoc/etc, but wanted to make sure people agree this is the right 
direction before I finish it up.

Here's a summary of the change:

- Add a new flag dfs.ha.automatic-failover.enabled, which is set 
per-nameservice or globally
- Add a new RequestInfo structure as a parameter to all the HAServiceProtocol 
methods. This currently just has one field, which indicates what type of client 
the request is on behalf of. It can either be a user (manual CLI failover), 
ZKFC (auto failover), or USER_FORCE -- indicating that it's a user who wants to 
avoid this safety check.
- In the NN, if auto-failover is enabled, disallow HA requests from users. If 
it's not enabled, disallow HA requests from ZKFCs.
- In the ZKFC, disallow startup if auto-failover is disabled


In addition to the unit tests, I ran the following manual tests, on a secure 
cluster.

1) did not enable auto failover config
2) ran failovers using haadmin command, succesfully
3) Tried to run bin/hdfs zkfc, got expected error:
{code}
12/04/05 20:53:38 INFO tools.DFSZKFailoverController: Failover controller 
configured for NameNode nameserviceId1.nn1
12/04/05 20:53:38 FATAL ha.ZKFailoverController: Automatic failover is not 
enabled for NameNode at todd-w510/127.0.0.1:8021. Please ensure that automatic 
failover is enabled in the configuration before running the ZK failover 
controller.
{code}
4) Enabled auto-failover in my config, but left NNs running. Got error when the 
ZKFC tried to make the local node active. TODO in future JIRA: it could abort 
at this point, when it sees an AccessControlException, since it's indicative of 
misconfiguration.

5) Restarted NNs, so they picked up the new config.
6) Ran ZKFC, it successfully made one of the NNs active. Verified automatic 
failover behavior by killing one of the NNs.
7) Ran manual failover command, got expected error:
{code}
12/04/05 20:58:31 ERROR ha.FailoverController: Unable to get service state for 
NameNode at todd-w510/127.0.0.1:8022: Manual HA control for this NameNode is 
disallowed, because automatic HA is enabled.
{code}

----

Open questions: should we allow the non-mutative commands like 
{{monitorHealth}} and {{getServiceState}} to run when auto-failover is 
configured? My thinking is probably. If so, should we keep around the 
RequestInfo parameter on those calls? Or only include RequestInfo for the calls 
that trigger transitions?

                
> Auto-HA: add a config to enable auto-HA, which disables manual FC
> -----------------------------------------------------------------
>
>                 Key: HADOOP-8247
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8247
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: auto-failover, ha
>    Affects Versions: Auto Failover (HDFS-3042)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8247.txt
>
>
> Currently, if automatic failover is set up and running, and the user uses the 
> "haadmin -failover" command, he or she can end up putting the system in an 
> inconsistent state, where the state in ZK disagrees with the actual state of 
> the world. To fix this, we should add a config flag which is used to enable 
> auto-HA. When this flag is set, we should disallow use of the haadmin command 
> to initiate failovers. We should refuse to run ZKFCs when the flag is not 
> set. Of course, this flag should be scoped by nameservice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8247) Auto-HA: add a config to enable auto-HA, which disables manual FC

Reply via email to