[
https://issues.apache.org/jira/browse/HDFS-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333864#comment-15333864
]
XingFeng Shen commented on HDFS-10536:
--------------------------------------
Standby NN will throw this Exception.
{code}
2016-06-16 20:27:51,456 | INFO | Edit log tailer | Triggering log roll on
remote NameNode | EditLogTailer.java:296
2016-06-16 20:27:51,531 | WARN | Edit log tailer | Failed to reach remote
node: RemoteNameNodeInfo [nnId=19,
ipcAddress=szv1000044725/10.120.176.172:25000,
httpAddress=https://szv1000044725:25003], retrying with remaining remote NNs |
EditLogTailer.java:431
2016-06-16 20:27:51,535 | WARN | Edit log tailer | Failed to reach remote
node: RemoteNameNodeInfo [nnId=19,
ipcAddress=szv1000044725/10.120.176.172:25000,
httpAddress=https://szv1000044725:25003], retrying with remaining remote NNs |
EditLogTailer.java:431
2016-06-16 20:27:51,538 | WARN | Edit log tailer | Failed to reach remote
node: RemoteNameNodeInfo [nnId=19,
ipcAddress=szv1000044725/10.120.176.172:25000,
httpAddress=https://szv1000044725:25003], retrying with remaining remote NNs |
EditLogTailer.java:431
2016-06-16 20:27:51,538 | WARN | Edit log tailer | Unable to trigger a roll of
the active NN | EditLogTailer.java:316
java.io.IOException: Cannot find any valid remote NN to service request!
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$MultipleNameNodeProxy.call(EditLogTailer.java:439)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:298)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$800(EditLogTailer.java:70)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:355)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:324)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:341)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1691)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:443)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:337)
{code}
After one namenode become active, standby NN still can not trigger log roll
again because variable "nnLoopCount" is still 3, it can not init to 0.
{code}
private NamenodeProtocol getActiveNodeProxy() throws IOException {
if (cachedActiveProxy == null) {
while (true) {
// if we have reached the max loop count, quit by returning null
if ((nnLoopCount / nnCount) >= maxRetries) {
return null;
}
......
}
}
assert cachedActiveProxy != null;
return cachedActiveProxy;
}
{code}
> Standby NN can not trigger log roll after EditLogTailer thread failed 3 times
> in EditLogTailer.triggerActiveLogRoll method.
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10536
> URL: https://issues.apache.org/jira/browse/HDFS-10536
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: auto-failover
> Reporter: XingFeng Shen
> Priority: Critical
>
> When all NameNodes become standby, EditLogTailer will retry 3 times to
> trigger log roll, then it will be failed and throw Exception "Cannot find any
> valid remote NN to service request!". After one namenode become active,
> standby NN still can not trigger log roll again because variable
> "nnLoopCount" is still 3, it can not init to 0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]