[
https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978106#comment-16978106
]
Ayush Saxena commented on HDFS-14961:
-------------------------------------
Thanx [~ferhui] for giving a check, In the starting it was supposed to be like
that, ZKFC shouldn't be running for the ONN, but post HDFS-14130, it is
allowed. It made Observer ZKFC aware and it works in all cases, If you check,
apart from this Race condition, Seems every case is handled, ONN will not
participate in Election and all.
Ofcourse, Stopping the third ZKFC would make the test pass, but I think it will
break the intent for which it was added. After HDFS-14130, it is supposed that
ZKFC shouldn't bother ONN and doesn't try converting it to SNN. check
description of HDFS-14130 :
{noformat}
Need to fix automatic failover with ZKFC. Currently it does not know about
ObserverNodes trying to convert them to SBNs.
{noformat}
If I just fix the test by closing the ZKFC for third ONN, Then it would be like
ZKFC can run with ONN, but once ONN has started then only ZKFC can start, so as
to avoid ZKFC seeing the NN in a previous state than OBSERVER, which allows
participation in election.
The present fix, Just ensures ONN doesn't get instructed by ZKFC. Since ONN
isn't suppose to participate in election. Seems safe enough.
[~elgoiri]
bq. If I understand correctly, this is not a flaky test but the logic is not
correct.
Yes, There seems a problem with the logic itself.
bq. Here we are preventing ZKFC making an OBSERVER NN STANDBY, right?
Yes, We are preventing ZKFC to turn ONN to SNN, Since ONN isn't suppose to
participate in Election.
bq. Do we have any place where we explain the flow?
Flow as in the ZKFC election part? I don't think so, there is much detailing of
the process. I too have limited knowledge only on the flow. The ZKFC managing
states of Namenode runs parallel, and is independent of DFSAdmin Commands
instructing change of states.
bq. We should change the title and adapt the description accordingly.
Sure, will change it accordingly.
> TestDFSZKFailoverController fails consistently
> ----------------------------------------------
>
> Key: HDFS-14961
> URL: https://issues.apache.org/jira/browse/HDFS-14961
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Íñigo Goiri
> Assignee: Ayush Saxena
> Priority: Major
> Attachments: HDFS-14961-01.patch
>
>
> TestDFSZKFailoverController has been consistently failing with a time out
> waiting in testManualFailoverWithDFSHAAdmin(). In particular
> {{waitForHAState(1, HAServiceState.OBSERVER);}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]