[ 
https://issues.apache.org/jira/browse/AMBARI-12488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639747#comment-14639747
 ] 

Hudson commented on AMBARI-12488:
---------------------------------

SUCCESS: Integrated in Ambari-branch-2.1 #265 (See 
[https://builds.apache.org/job/Ambari-branch-2.1/265/])
AMBARI-12488. RU - Use haadmin failover command instead of killing ZKFC during 
upgrade/downgrade (alejandro) (afernandez: 
http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=4532b5192c4677165a321778b92d8fe14530024b)
* ambari-server/src/main/resources/stacks/HDP/2.2/upgrades/upgrade-2.2.xml


> RU - Use haadmin failover command instead of killing ZKFC during 
> upgrade/downgrade
> ----------------------------------------------------------------------------------
>
>                 Key: AMBARI-12488
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12488
>             Project: Ambari
>          Issue Type: Story
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>              Labels: rolling_upgrade
>             Fix For: 2.1.1
>
>         Attachments: AMBARI-12488.patch, AMBARI-12488.v1.patch, 
> AMBARI-12488.v2.patch
>
>
> Currently RU orchestration during upgrade/downgrade kills ZKFC on the active 
> NameNode to initiate a failover to standby. We should instead use the 
> failover command.
> E.g.,
> {code}
> su hdfs -c 'hdfs haadmin -failover nn1 nn2'
> {code}
> Where nn1 is the current namenode if it if the active one, and nn2 is the 
> remaining namenode.
> This is safer than killing zkfc on the active namenode because this command 
> first tries to gracefully transition a NameNode to the Standby state. If this 
> fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be 
> attempted until one succeeds. After this process the second NameNode will be 
> transitioned to the Active state. 
> It reduces long waits between ZKFC kill, failure kicking-in after a timeout, 
> and then NN becoming active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to