[jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController

Todd Lipcon (Commented) (JIRA) Wed, 04 Apr 2012 16:00:48 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246819#comment-13246819
 ]


Todd Lipcon commented on HDFS-2185:
-----------------------------------

bq. I'm not quite sure how it can be guaranteed. NN cannot be aware of who 
issues a transition, right?

My plan was to add an enum flag to the RPCs like {{transitionToActive}} and 
{{transitionToStandby}} that would indicate who sent it. For example 
"CLI_FAILOVER", "ZKFC_FAILOVER", or "FORCE". The force option would be there so 
that if the admin *really* knows what he/she is doing, they could override the 
safety check. Otherwise the haadmin commands can prevent users from 
accidentally shooting themselves in the foot.


bq. I still think it makes sense to ops to have an option to turn on/off auto 
failover on-demand. In case of ZKFC issues, we still can have an alternative 
way to bypass it. However I'm neither sure it would help ops or confuse them.

Thats a good point - it's useful for emergency situations. I think we can solve 
this with docs, though -- if you want to stop automatic failovers, you need to 
first shut down the standby ZKFCs, then the active ZKFC. If you bring them down 
in the other order, it won't break things, but you might get a failover in the 
process. I think adding a programatic way to do this is a future improvement.
                
> HA: HDFS portion of ZK-based FailoverController
> -----------------------------------------------
>
>                 Key: HDFS-2185
>                 URL: https://issues.apache.org/jira/browse/HDFS-2185
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: auto-failover, ha
>    Affects Versions: 0.24.0, 0.23.3
>            Reporter: Eli Collins
>            Assignee: Todd Lipcon
>             Fix For: Auto failover (HDFS-3042)
>
>         Attachments: Failover_Controller.jpg, hdfs-2185.txt, hdfs-2185.txt, 
> hdfs-2185.txt, hdfs-2185.txt, hdfs-2185.txt, zkfc-design.pdf, 
> zkfc-design.pdf, zkfc-design.pdf, zkfc-design.pdf, zkfc-design.tex
>
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2185) HA: HDFS portion of ZK-based FailoverController

Reply via email to