[ 
https://issues.apache.org/jira/browse/HDDS-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddhant Sangwan reassigned HDDS-6900:
--------------------------------------

    Assignee: Siddhant Sangwan

> SCM node went down with UndeclaredThrowableException while running container 
> balancer
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-6900
>                 URL: https://issues.apache.org/jira/browse/HDDS-6900
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM HA
>            Reporter: Nilotpal Nandi
>            Assignee: Siddhant Sangwan
>            Priority: Major
>
> SCM nodeĀ  went down with UndeclaredThrowableException when container balancer 
> is running and 2 other SCM nodes were shutdown.
> {noformat}
> 2022-06-15 20:00:15,634 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-GrpcLogAppender:
>  Leader has not got in touch with Follower 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764(c-1,m0,n310,
>  attendVote=true, lastRpcSendTime=1, lastRpcResponseTime=32843) yet, just 
> keep nextIndex unchanged and retry. 2022-06-15 20:00:16,887 WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-AppendLogResponseHandler:
>  Failed appendEntries: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
> exception 2022-06-15 20:00:16,888 WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-GrpcLogAppender:
>  Leader has not got in touch with Follower 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764(c-1,m0,n310,
>  attendVote=true, lastRpcSendTime=4, lastRpcResponseTime=34097) yet, just 
> keep nextIndex unchanged and retry. 2022-06-15 20:00:18,121 ERROR 
> org.apache.ratis.server.impl.StateMachineUpdater: 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-StateMachineUpdater 
> caught a Throwable. java.lang.reflect.UndeclaredThrowableException at 
> com.sun.proxy.$Proxy19.completeMove(Unknown Source) at 
> org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.deleteSrcDnForMove(LegacyReplicationManager.java:1249)
>  at 
> org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.lambda$onLeaderReadyAndOutOfSafeMode$40(LegacyReplicationManager.java:1871)
>  at 
> java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603)
>  at 
> org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.onLeaderReadyAndOutOfSafeMode(LegacyReplicationManager.java:1850)
>  at 
> org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.notifyStatusChanged(LegacyReplicationManager.java:1649)
>  at 
> org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.notifyStatusChanged(ReplicationManager.java:375)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMServiceManager.notifyStatusChanged(SCMServiceManager.java:52)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyTermIndexUpdated(SCMStateMachine.java:330)
>  at 
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1566)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182)
>  at java.base/java.lang.Thread.run(Thread.java:834) Caused by: 
> java.util.concurrent.TimeoutException at 
> java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
>  at 
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:225)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:111)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
>  ... 13 more 2022-06-15 20:00:18,122 INFO 
> org.apache.ratis.server.RaftServer$Division: 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF: shutdown 2022-06-15 
> 20:00:18,122 INFO org.apache.ratis.util.JmxRegister: Successfully 
> un-registered JMX Bean with object name 
> Ratis:service=RaftServer,group=group-0B75F4A309CF,id=99c85376-060f-4b3c-8973-a2d2b1dd23e6
>  2022-06-15 20:00:18,122 INFO org.apache.ratis.server.impl.RoleInfo: 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6: shutdown 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-LeaderStateImpl 
> 2022-06-15 20:00:18,124 INFO org.apache.ratis.server.impl.PendingRequests: 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-PendingRequests: 
> sendNotLeaderResponses 2022-06-15 20:00:18,125 WARN 
> org.apache.ratis.grpc.server.GrpcLogAppender: 
> 99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->b6382f07-de2e-4986-8275-9146e73360a6-GrpcLogAppender:
>  Wait interrupted by java.lang.InterruptedException 2022-06-15 20:00:18,128 
> INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: current leader SCM steps 
> down.{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to