[ https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984338#comment-16984338 ]
Vinayakumar B commented on HDFS-14961: -------------------------------------- Thanks [~ayushtkn] for the analysis and the fix. Fix looks good to me. +1. There is already a check present in HealthMonitor thread to quitElection when namenode state found to be OBSERVER. {code:java} if (changedState == HAServiceState.OBSERVER) { elector.quitElection(true); serviceState = HAServiceState.OBSERVER; return; }{code} But this is an async monitoring happening every 1 second. In case of manual transition, state can change directly in NameNode. So ZKFC syncs during monitoring and quits election. As [~ferhui] suggested, checking for the state before joining the election also doesn't hurt. Can be added as a separate Improvement Jira as [~ayushtkn] already said. {code:java} if(serviceState != HAServiceState.OBSERVER) { elector.joinElection(targetToData(localTarget)); }{code} > [SBN read] Prevent ZKFC changing Observer Namenode state > -------------------------------------------------------- > > Key: HDFS-14961 > URL: https://issues.apache.org/jira/browse/HDFS-14961 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Íñigo Goiri > Assignee: Ayush Saxena > Priority: Major > Attachments: HDFS-14961-01.patch, HDFS-14961-02.patch, > HDFS-14961-03.patch, HDFS-14961-04.patch, ZKFC-TEST-14961.patch > > > HDFS-14130 made ZKFC aware of the Observer Namenode and hence allows ZKFC > running along with the observer NOde. > The Observer namenode isn't suppose to be part of ZKFC election process. > But if the Namenode was part of election, before turning into Observer by > transitionToObserver Command. The ZKFC still sends instruction to the > Namenode as a result of previous participation and sometimes tend to change > the state of Observer to Standby. > This is also the reason for failure in TestDFSZKFailoverController. > TestDFSZKFailoverController has been consistently failing with a time out > waiting in testManualFailoverWithDFSHAAdmin(). In particular > {{waitForHAState(1, HAServiceState.OBSERVER);}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org