[
https://issues.apache.org/jira/browse/HDDS-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760869#comment-17760869
]
Szabolcs Gál commented on HDDS-3902:
------------------------------------
Couldn't reproduce the issue:
{code:java}
ozone freon omkg --duration 1m{code}
> OM HA client failover switcher to a wrong OM server
> ---------------------------------------------------
>
> Key: HDDS-3902
> URL: https://issues.apache.org/jira/browse/HDDS-3902
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM HA
> Reporter: Marton Elek
> Assignee: Szabolcs Gál
> Priority: Blocker
> Labels: 0.7.0
>
> Found this problem with the PR/branch HDDS-3878, but it seems to be
> independent.
> 1. ozone sh volume create /vol1 works well with HA
> 2. ozone freon omkg (rpc client) doesn't work
> {code}
> ozone freon omkg | grep "Failing over"
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy
> to index: 1, nodeId: om2
> 2020-06-30 14:15:31 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy
> to index: 2, nodeId: om3
> 2020-06-30 14:15:34 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy
> to index: 0, nodeId: omNodeIdDummy
> {code}
> om2 seems to be the leader but for some reason the failover logic switching
> back to an unknown node (?)
> {code}
> 2020-06-30 14:16:35 DEBUG OMFailoverProxyProvider:271 - Failing over OM proxy
> to index: 2, nodeId: om3
> 2020-06-30 14:16:35 DEBUG Client:63 - getting client out of cache:
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:35 DEBUG Client:497 - The ping interval is 60000 ms.
> 2020-06-30 14:16:35 DEBUG Client:795 - Connecting to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862
> 2020-06-30 14:16:35 DEBUG Client:1074 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root:
> starting, having connections 3
> 2020-06-30 14:16:35 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root
> sending #0
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got
> value #0
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took
> 439ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root
> sending #1
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got
> value #1
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 2ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root
> sending #2 org.apache.hadoop.ozone.om.pro
> tocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got
> value #2
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root
> sending #3
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-2.ozone-om.default.svc.cluster.local/10.42.0.175:9862 from root got
> value #3
> 2020-06-30 14:16:36 DEBUG ProtobufRpcEngine:254 - Call: submitRequest took 1ms
> 2020-06-30 14:16:36 DEBUG Client:63 - getting client out of cache:
> org.apache.hadoop.ipc.Client@f5acb9d
> 2020-06-30 14:16:36 DEBUG Groups:312 - GroupCacheLoader - load.
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #5
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #11
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #8
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #12
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #10
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #6
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #9
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #7
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #4
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1137 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root
> sending #13
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got
> value #5
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got
> value #8
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got
> value #11
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got
> value #10
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got
> value #12
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got
> value #7
> 2020-06-30 14:16:36 DEBUG Hadoop3OmTransport:140 - RetryProxy: OM:om1 is not
> the leader. Suggested leader is OM:om3.
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.createNotLeaderException(OzoneManagerProtocolServerSideTranslatorPB.java:198)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:141)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:113)
> at
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
> 2020-06-30 14:16:36 DEBUG Client:1191 - IPC Client (363509958) connection to
> ozone-om-0.ozone-om.default.svc.cluster.local/10.42.0.173:9862 from root got
> value #9
> 2020-06-30 14:16:36 DEBUG OMFailoverProxyProvider:299 - Incrementing OM proxy
> index to 0, nodeId: omNodeIdDummy
> {code}
> As you can see (after a few failover) finally om2 has been found and a few
> requests has been handled. But after that the client switched back to the om0
> (???)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]