[ 
https://issues.apache.org/jira/browse/HDFS-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377866#comment-17377866
 ] 

Denis Serduik commented on HDFS-4957:
-------------------------------------

We've got beaten by exact same scenario. 3 Journal nodes and  2 of them are 
collocated with NNs. See logs bellow:
{noformat}
2021-07-09 06:34:53,932 INFO namenode.RedundantEditLogInputStream: 
Fast-forwarding stream 
'http://XX-XXXXX-XXX.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true'
 to transaction ID 449
2021-07-09 06:34:53,970 INFO namenode.FSImage: Edits file 
http://XX-XXXXX-XXX.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true,
 
http://YYY-YYYYY-YYYY.internal.cloudapp.net:8480/getJournal?jid=conduit&segmentTxId=449&storageInfo=-65%3A326724551%3A1625810812563%3ACID-b9e44cb5-91e4-4a55-85dc-6442fa9e44e4&inProgressOk=true
 of size 42 edits # 2 loaded in 0 seconds
2021-07-09 06:36:22,127 INFO namenode.FSNamesystem: Stopping services started 
for standby state
2021-07-09 06:36:22,128 WARN ha.EditLogTailer: Edit log tailer interrupted
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:469)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:399)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:416)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:484)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:412)
2021-07-09 06:36:22,130 INFO namenode.FSNamesystem: Starting services required 
for active state
2021-07-09 06:36:22,131 ERROR namenode.NameNode: Error encountered requiring NN 
shutdown. Shutting down immediately.
java.lang.IllegalArgumentException: Unable to construct journal, 
qjournal://XX-XXXXX-XXX.internal.cloudapp.net:8485;YYY-YYYYY-YYYY.internal.cloudapp.net:8485;ZZ-ZZZZZ-ZZZZ.internal.cloudapp.net:8485/conduit
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1824)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:294)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:259)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1223)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1890)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1749)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1742)
        at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
        at 
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1811)
        ... 19 more
Caused by: java.net.UnknownHostException: 
XX-XXXXX-XXX.internal.cloudapp.net:8485
        at 
org.apache.hadoop.hdfs.server.common.Util.getAddressesList(Util.java:378)
        at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:388)
        at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:170)
        at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:126)
        at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.<init>(QuorumJournalManager.java:105)
        ... 24 more
2021-07-09 06:36:22,132 INFO util.ExitUtil: Exiting with status 1: 
java.lang.IllegalArgumentException: Unable to construct journal, 
qjournal://XX-XXXXX-XXX.internal.cloudapp.net:8485;YYY-YYYYY-YYYY.internal.cloudapp.net:8485;ZZ-ZZZZZ-ZZZZ.internal.cloudapp.net:8485/conduit
2021-07-09 06:36:22,133 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at YYY-YYYYY-YYYY/10.9.0.5
************************************************************/
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of 
HADOOP_PREFIX.
2021-07-09 06:36:22,837 INFO namenode.NameNode: STARTUP_MSG: 

{noformat}

> NameNode failover should not fail because a DNS entry for a quorum node 
> cannot be resolved
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4957
>                 URL: https://issues.apache.org/jira/browse/HDFS-4957
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: qjm
>    Affects Versions: 2.3.0, 2.6.0
>            Reporter: Colin McCabe
>            Assignee: John Zhuge
>            Priority: Major
>
> When a StandbyNameNode is becoming active, we should not bail out because a 
> DNS entry for a quorum node cannot be resolved.  Currently it does fail in 
> this scenario, with a message like this:
> {code}
> 2013-07-03 21:28:40,576 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services 
> required for active state
> 2013-07-03 21:28:40,579 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring 
> NN shutdown. Shutting down immediately.
> java.lang.IllegalArgumentException: Unable to construct journal, 
> qjournal://hadoop-mm:8485;hadoop-nn-0:8485;hadoop-nn-1:8485/hadoop
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1254)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:722)
> <etc>
> {code}
> reported by Matt Bookman



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to