[ 
https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984338#comment-16984338
 ] 

Vinayakumar B commented on HDFS-14961:
--------------------------------------

Thanks [~ayushtkn] for the analysis and the fix.
 Fix looks good to me. +1.

There is already a check present in HealthMonitor thread to quitElection when 
namenode state found to be OBSERVER.
{code:java}
        if (changedState == HAServiceState.OBSERVER) {
          elector.quitElection(true);
          serviceState = HAServiceState.OBSERVER;
          return;
        }{code}
But this is an async monitoring happening every 1 second. In case of manual 
transition, state can change directly in NameNode. So ZKFC syncs during 
monitoring and quits election.

As [~ferhui] suggested, checking for the state before joining the election also 
doesn't hurt. Can be added as a separate Improvement Jira as [~ayushtkn] 
already said.
{code:java}          if(serviceState != HAServiceState.OBSERVER) {
            elector.joinElection(targetToData(localTarget));
          }{code}
 

> [SBN read] Prevent ZKFC changing Observer Namenode state
> --------------------------------------------------------
>
>                 Key: HDFS-14961
>                 URL: https://issues.apache.org/jira/browse/HDFS-14961
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Íñigo Goiri
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14961-01.patch, HDFS-14961-02.patch, 
> HDFS-14961-03.patch, HDFS-14961-04.patch, ZKFC-TEST-14961.patch
>
>
> HDFS-14130 made ZKFC aware of the Observer Namenode and hence allows ZKFC 
> running along with the observer NOde.
> The Observer namenode isn't suppose to be part of ZKFC election process.
> But if the  Namenode was part of election, before turning into Observer by 
> transitionToObserver Command. The ZKFC still sends instruction to the 
> Namenode as a result of previous participation and sometimes tend to change 
> the state of Observer to Standby.
> This is also the reason for  failure in TestDFSZKFailoverController.
> TestDFSZKFailoverController has been consistently failing with a time out 
> waiting in testManualFailoverWithDFSHAAdmin(). In particular 
> {{waitForHAState(1, HAServiceState.OBSERVER);}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to