[
https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978295#comment-16978295
]
Ayush Saxena commented on HDFS-14961:
-------------------------------------
Added a line for state in the log, just before the check you said, sharing part
of log :
{noformat}
019-11-20 15:52:47,061 [IPC Server handler 2 on default port 10024-EventThread]
INFO ha.ZKFailoverController (ZKFailoverController.java:becomeStandby(491)) -
ZK Election indicated that NameNode at localhost/127.0.0.1:10022 should become
standby
2019-11-20 15:52:47,065 [IPC Server handler 2 on default port 10022] WARN
namenode.NameNode (NameNode.java:checkHaStateChange(2106)) - Allowing manual HA
control from 127.0.0.1 even though automatic HA is enabled, because the user
specified the force flag
2019-11-20 15:52:47,066 [IPC Server handler 2 on default port 10022] INFO
namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1445)) - Stopping
services started for standby state
2019-11-20 15:52:47,070 [Edit log tailer] WARN ha.EditLogTailer
(EditLogTailer.java:doWork(528)) - Edit log tailer interrupted: sleep
interrupted
2019-11-20 15:52:47,070 [IPC Server handler 2 on default port 10022] INFO
namenode.FSNamesystem (FSNamesystem.java:startStandbyServices(1402)) - Starting
services required for observer state
2019-11-20 15:52:47,077 [IPC Server handler 2 on default port 10022] INFO
ha.EditLogTailer (EditLogTailer.java:<init>(205)) - Will roll logs on active
node every 120 seconds.
2019-11-20 15:52:47,078 [IPC Server handler 3 on default port 10022] INFO
namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1445)) - Stopping
services started for standby state
2019-11-20 15:52:47,080 [IPC Server handler 3 on default port 10022] INFO
namenode.FSNamesystem (FSNamesystem.java:startStandbyServices(1402)) - Starting
services required for standby state
2019-11-20 15:52:47,084 [IPC Server handler 3 on default port 10022] INFO
ha.EditLogTailer (EditLogTailer.java:<init>(205)) - Will roll logs on active
node every 120 seconds.
2019-11-20 15:52:47,085 [IPC Server handler 2 on default port
10024-EventThread] INFO ha.ZKFailoverController
(ZKFailoverController.java:becomeStandby(496)) - Successfully transitioned
NameNode at localhost/127.0.0.1:10022 to standby state
2019-11-20 15:52:56,642 [ZKFC Delay timer #0] INFO ha.ZKFailoverController
(ZKFailoverController.java:recheckElectability(802)) - STATE is : active
2019-11-20 15:52:56,642 [ZKFC Delay timer #0] INFO ha.ActiveStandbyElector
(ActiveStandbyElector.java:joinElection(300)) - Already in election. Not
re-connecting.
2019-11-20 15:52:56,870 [ZKFC Delay timer #0] INFO ha.ZKFailoverController
(ZKFailoverController.java:recheckElectability(802)) - STATE is : standby
2019-11-20 15:52:56,871 [ZKFC Delay timer #0] INFO ha.ActiveStandbyElector
(ActiveStandbyElector.java:joinElection(300)) - Already in election. Not
re-connecting.
2019-11-20 15:53:12,123 [Time-limited test] INFO zookeeper.JUnit4ZKTestRunner
(JUnit4ZKTestRunner.java:evaluate(99)) - TEST METHOD FAILED
testManualFailoverWithDFSHAAdmin
{noformat}
> Prevent ZKFC changing Observer Namenode state
> ---------------------------------------------
>
> Key: HDFS-14961
> URL: https://issues.apache.org/jira/browse/HDFS-14961
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Íñigo Goiri
> Assignee: Ayush Saxena
> Priority: Major
> Attachments: HDFS-14961-01.patch, HDFS-14961-02.patch
>
>
> HDFS-14130 made ZKFC aware of the Observer Namenode and hence allows ZKFC
> running along with the observer NOde.
> The Observer namenode isn't suppose to be part of ZKFC election process.
> But if the Namenode was part of election, before turning into Observer by
> transitionToObserver Command. The ZKFC still sends instruction to the
> Namenode as a result of previous participation and sometimes tend to change
> the state of Observer to Standby.
> This is also the reason for failure in TestDFSZKFailoverController.
> TestDFSZKFailoverController has been consistently failing with a time out
> waiting in testManualFailoverWithDFSHAAdmin(). In particular
> {{waitForHAState(1, HAServiceState.OBSERVER);}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]