[ 
https://issues.apache.org/jira/browse/HADOOP-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HADOOP-10668:
-----------------------------
    Attachment: HADOOP-10668.patch

It appears the check whether a node is in the right state could be the issue. 
{{ZKFailoverController}} has its own {{serviceState}}. HA service such as 
DummyHAService has its own state. What happened here is {{MiniZKFCCluster}}'s 
{{waitForHAState}} uses DummyHAService state to decide the state has 
transitioned properly. But when fencing is involved, the to-be-elected active 
will directly call the old active's {{transitionToStandby}} method. Thus 
{{DummyHAService}}'s state could be set to standby before 
{{ZKFailoverController}}'s state is updated.

The patch didn't change the fact {{ZKFailoverController}}'s state is only 
updated when it receives notification from ZK callback. So with the fix, it 
might still get the following error in the log. But that is ok, 
{{ZKFailoverController}}'s state eventually will be changed to standby.
 
{noformat}
2015-01-12 15:08:16,497 ERROR ha.ZKFailoverController 
(ZKFailoverController.java:verifyChangedServiceState(828)) - Local service 
DummyHAService #1 has changed the serviceState to standby. Expected was active. 
Quitting election marking fencing necessary.
{noformat}

> TestZKFailoverControllerStress#testExpireBackAndForth occasionally fails
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-10668
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10668
>             Project: Hadoop Common
>          Issue Type: Test
>          Components: test
>    Affects Versions: 3.0.0
>            Reporter: Ted Yu
>              Labels: test
>         Attachments: HADOOP-10668.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/4018//testReport/org.apache.hadoop.ha/TestZKFailoverControllerStress/testExpireBackAndForth/
>  :
> {code}
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>       at org.apache.zookeeper.server.DataTree.getData(DataTree.java:648)
>       at org.apache.zookeeper.server.ZKDatabase.getData(ZKDatabase.java:371)
>       at 
> org.apache.hadoop.ha.MiniZKFCCluster.expireActiveLockHolder(MiniZKFCCluster.java:199)
>       at 
> org.apache.hadoop.ha.MiniZKFCCluster.expireAndVerifyFailover(MiniZKFCCluster.java:234)
>       at 
> org.apache.hadoop.ha.TestZKFailoverControllerStress.testExpireBackAndForth(TestZKFailoverControllerStress.java:84)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to