[
https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977287#comment-16977287
]
Ayush Saxena commented on HDFS-14961:
-------------------------------------
Well I tried to check this put some sleeps in the test to get this reproduced a
couple of times(It doesn't get repro that easily)
I feel the reason is, since the zkfc is running in the node which we tend to
convert to Observer, Before we send the command to turn it to Observer, if that
NN is in election, It gets a command becomeStandby() from ZKFC, if the present
state is OBSERVER, than this command converts it into STANDBY.
For the test, it passes if becomeStandby() comes before we give command to turn
to OBSERVER. Or becomeStandby() comes after the test has captured the OBSERVER
state.
As of Now OBSERVER can't participate in Election, but can take instructions
from ZKFC as part of previous participation.
Seems to me a kind of Race Condition, Not just a test issue.
How to handle the situation, I still need to think, Not sure there is a simple
way to distinguish whether the call to becomeStandby() is from ZKFC or other.
I may be wrong...Not very much into ZKFC stuff either. Just tried!!!
Any opinions or idea on this??
> TestDFSZKFailoverController fails consistently
> ----------------------------------------------
>
> Key: HDFS-14961
> URL: https://issues.apache.org/jira/browse/HDFS-14961
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Íñigo Goiri
> Priority: Major
>
> TestDFSZKFailoverController has been consistently failing with a time out
> waiting in testManualFailoverWithDFSHAAdmin(). In particular
> {{waitForHAState(1, HAServiceState.OBSERVER);}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]