[
https://issues.apache.org/jira/browse/HDDS-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-12204:
----------------------------------
Labels: pull-request-available (was: )
> Improve failover logging
> ------------------------
>
> Key: HDDS-12204
> URL: https://issues.apache.org/jira/browse/HDDS-12204
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Wei-Chiu Chuang
> Assignee: Chia-Chuan Yu
> Priority: Major
> Labels: pull-request-available
>
> When an OM is unable to find the leader SCM, the failover message is not easy
> to understand.
> {noformat}
> 2025-02-03 20:50:13,520 INFO [IPC Server handler 49 on
> 9862]-org.apache.hadoop.io.retry.RetryInvocationHandler:
> com.google.protobuf.ServiceException:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException):
> Server:1eba3c81-e2b5-4daa-9767-fd81754abcb6 is not the leader. Could not
> determine the leader node.
> at
> org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:105)
> at
> org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:248)
> at
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:111)
> at
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:15776)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2899)
> , while invoking $Proxy33.send over
> nodeId=node1,nodeAddress=ccycloud-3.weichiu-hbase.root.comops.site/10.140.95.198:9863
> after 279 failover attempts. Trying to failover after sleeping for 2000ms.
> {noformat}
> It would be better if it can log the role type that can't be connected to
> (SCM in this case) and host name, instead of the bizarre UUID.
> Example: log instead "SCM Server:1eba3c81-e2b5-4daa-9767-fd81754abcb6
> (192.168.0.1) is not the leader. Could not determine the leader node."
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]