[
https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969480#comment-16969480
]
Ayush Saxena commented on HDFS-14961:
-------------------------------------
Well I was able to reproduce this :
Sharing the log part which I suspect could be the reason, (It directs it was
able to transition to Observer, then moved to standby)
{noformat}
You have specified the --forcemanual flag. This flag is dangerous, as it can
induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly
irrecoverably.
It is recommended not to use this flag, but instead to shut down the cluster
and disable automatic failover if you prefer to manually manage your HA state.
You may abort safely by answering 'n' or hitting ^C now.
Are you sure you want to continue? (Y or N) 2019-11-07 23:43:24,936 [Listener
at localhost/10022] WARN ha.HAAdmin
(HAAdmin.java:checkManualStateManagementOK(269)) - Proceeding with manual HA
state management even though
automatic failover is enabled for NameNode at localhost/127.0.0.1:10022
2019-11-07 23:43:24,940 [IPC Server handler 5 on default port 10022] WARN
namenode.NameNode (NameNode.java:checkHaStateChange(2106)) - Allowing manual HA
control from 127.0.0.1 even though automatic HA is enabled, because the user
specified the force flag
2019-11-07 23:43:24,941 [IPC Server handler 5 on default port 10022] INFO
namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1445)) - Stopping
services started for standby state
2019-11-07 23:43:24,941 [Edit log tailer] WARN ha.EditLogTailer
(EditLogTailer.java:doWork(528)) - Edit log tailer interrupted: sleep
interrupted
2019-11-07 23:43:24,941 [IPC Server handler 5 on default port 10022] INFO
namenode.FSNamesystem (FSNamesystem.java:startStandbyServices(1402)) - Starting
services required for observer state
2019-11-07 23:43:24,944 [IPC Server handler 5 on default port 10022] INFO
ha.EditLogTailer (EditLogTailer.java:<init>(205)) - Will roll logs on active
node every 120 seconds.
2019-11-07 23:43:24,955 [IPC Server handler 2 on default port
10024-EventThread] INFO ha.ZKFailoverController
(ZKFailoverController.java:becomeStandby(491)) - ZK Election indicated that
NameNode at localhost/127.0.0.1:10022 should become standby
2019-11-07 23:43:24,964 [IPC Server handler 9 on default port 10022] INFO
namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1445)) - Stopping
services started for standby state
2019-11-07 23:43:24,966 [Edit log tailer] WARN ha.EditLogTailer
(EditLogTailer.java:doWork(528)) - Edit log tailer interrupted: sleep
interrupted
2019-11-07 23:43:24,966 [IPC Server handler 9 on default port 10022] INFO
namenode.FSNamesystem (FSNamesystem.java:startStandbyServices(1402)) - Starting
services required for standby state
2019-11-07 23:43:24,971 [IPC Server handler 9 on default port 10022] INFO
ha.EditLogTailer (EditLogTailer.java:<init>(205)) - Will roll logs on active
node every 120 seconds.
2019-11-07 23:43:24,972 [IPC Server handler 2 on default port
10024-EventThread] INFO ha.ZKFailoverController
(ZKFailoverController.java:becomeStandby(496)) - Successfully transitioned
NameNode at localhost/127.0.0.1:10022 to standby state
2019-11-07 23:43:34,499 [ZKFC Delay timer #0] INFO ha.ActiveStandbyElector
(ActiveStandbyElector.java:joinElection(300)) - Already in election. Not
re-connecting.
2019-11-07 23:43:34,728 [ZKFC Delay timer #0] INFO ha.ActiveStandbyElector
(ActiveStandbyElector.java:joinElection(300)) - Already in election. Not
re-connecting.
2019-11-07 23:43:50,021 [Listener at localhost/10022] INFO
zookeeper.JUnit4ZKTestRunner (JUnit4ZKTestRunner.java:evaluate(99)) - TEST
METHOD FAILED testManualFailoverWithDFSHAAdmin
{noformat}
> TestDFSZKFailoverController fails consistently
> ----------------------------------------------
>
> Key: HDFS-14961
> URL: https://issues.apache.org/jira/browse/HDFS-14961
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Íñigo Goiri
> Priority: Major
>
> TestDFSZKFailoverController has been consistently failing with a time out
> waiting in testManualFailoverWithDFSHAAdmin(). In particular
> {{waitForHAState(1, HAServiceState.OBSERVER);}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]