[
https://issues.apache.org/jira/browse/HDFS-14271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767454#comment-16767454
]
Erik Krogen edited comment on HDFS-14271 at 2/13/19 6:45 PM:
-------------------------------------------------------------
Hey [~jojochuang], thanks for reporting. What I found from reading the logic in
{{RetryInvocationHandler#log()}} and verified via testing is: (let S = Standby,
A = Active, O = Observer, and their ordering represent the ordering in the
configs)
* If you have a setup like SAS or ASS, you will get no log message, because
only one failover occurs
* If you have a setup like SSA, you will get a log message, because two
failovers occur
* If you have a setup like OAS, you will get no log message, because only one
failover occurs
* If you have a setup like OSA, you will get a log message, because two
failovers occur
I agree that this is a bug, but I don't actually think it's related to the
read-from-standby feature, given that it can also occur in a setup with only
standby nodes. It seems to me a bug with the multiple standby feature
(HDFS-6440) not changing the assumption here of only 2 NameNodes.
Please let me know if your testing shows different results than what I have
discussed.
was (Author: xkrogen):
Hey [~jojochuang], thanks for reporting. What I found from reading the logic in
{{RetryInvocationHandler#log()}} and verified via testing is: (let S = Standby,
A = Active, O = Observer, and their ordering represent the ordering in the
configs)
* If you have a setup like SAS or ASS, you will get no log message, because
only one failover occurs
* If you have a setup like SSA, you will get a log message, because two
failovers occur
* If you have a setup like OAS, you will get no log message, because only one
failover occurs
* If you have a setup like OSA, you will get a log message, because two
failovers occur
I agree that this is a bug, but I don't actually think it's related to the
read-from-standby feature, given that it can also occur in a setup with only
standby nodes.
Please let me know if your testing shows different results than what I have
discussed.
> [SBN read] StandbyException is logged if Observer is the first NameNode
> -----------------------------------------------------------------------
>
> Key: HDFS-14271
> URL: https://issues.apache.org/jira/browse/HDFS-14271
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Affects Versions: 3.3.0
> Reporter: Wei-Chiu Chuang
> Priority: Minor
>
> If I transition the first NameNode into Observer state, and then I create a
> file from command line, it prints the following StandbyException log message,
> as if the command failed. But it actually completed successfully:
> {noformat}
> [root@weichiu-sbsr-1 ~]# hdfs dfs -touchz /tmp/abf
> 19/02/12 16:35:17 INFO retry.RetryInvocationHandler:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
> Operation category WRITE is not supported in state observer. Visit
> https://s.apache.org/sbnn-error
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1987)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1424)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:762)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:458)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:918)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:853)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2782)
> , while invoking $Proxy4.create over
> [weichiu-sbsr-1.gce.cloudera.com/172.31.121.145:8020,weichiu-sbsr-2.gce.cloudera.com/172.31.121.140:8020].
> Trying to failover immediately.
> {noformat}
> This is unlike the case when the first NameNode is the Standby, where this
> StandbyException is suppressed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]