Nilotpal Nandi created HDDS-6900:
------------------------------------

             Summary: SCM node went down with UndeclaredThrowableException 
while running container balancer
                 Key: HDDS-6900
                 URL: https://issues.apache.org/jira/browse/HDDS-6900
             Project: Apache Ozone
          Issue Type: Bug
          Components: SCM HA
            Reporter: Nilotpal Nandi


SCM nodeĀ  went down with UndeclaredThrowableException when container balancer 
is running and 2 other SCM nodes were shutdown.
{noformat}
2022-06-15 20:00:15,634 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-GrpcLogAppender:
 Leader has not got in touch with Follower 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764(c-1,m0,n310,
 attendVote=true, lastRpcSendTime=1, lastRpcResponseTime=32843) yet, just keep 
nextIndex unchanged and retry. 2022-06-15 20:00:16,887 WARN 
org.apache.ratis.grpc.server.GrpcLogAppender: 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-AppendLogResponseHandler:
 Failed appendEntries: 
org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io 
exception 2022-06-15 20:00:16,888 WARN 
org.apache.ratis.grpc.server.GrpcLogAppender: 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764-GrpcLogAppender:
 Leader has not got in touch with Follower 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->cbdea5d3-682d-43e6-a17a-bdce757b7764(c-1,m0,n310,
 attendVote=true, lastRpcSendTime=4, lastRpcResponseTime=34097) yet, just keep 
nextIndex unchanged and retry. 2022-06-15 20:00:18,121 ERROR 
org.apache.ratis.server.impl.StateMachineUpdater: 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-StateMachineUpdater 
caught a Throwable. java.lang.reflect.UndeclaredThrowableException at 
com.sun.proxy.$Proxy19.completeMove(Unknown Source) at 
org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.deleteSrcDnForMove(LegacyReplicationManager.java:1249)
 at 
org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.lambda$onLeaderReadyAndOutOfSafeMode$40(LegacyReplicationManager.java:1871)
 at 
java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603)
 at 
org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.onLeaderReadyAndOutOfSafeMode(LegacyReplicationManager.java:1850)
 at 
org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.notifyStatusChanged(LegacyReplicationManager.java:1649)
 at 
org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.notifyStatusChanged(ReplicationManager.java:375)
 at 
org.apache.hadoop.hdds.scm.ha.SCMServiceManager.notifyStatusChanged(SCMServiceManager.java:52)
 at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyTermIndexUpdated(SCMStateMachine.java:330)
 at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1566)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:239)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:182)
 at java.base/java.lang.Thread.run(Thread.java:834) Caused by: 
java.util.concurrent.TimeoutException at 
java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
 at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
 at 
org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:225)
 at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:111)
 at 
org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
 ... 13 more 2022-06-15 20:00:18,122 INFO 
org.apache.ratis.server.RaftServer$Division: 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF: shutdown 2022-06-15 
20:00:18,122 INFO org.apache.ratis.util.JmxRegister: Successfully un-registered 
JMX Bean with object name 
Ratis:service=RaftServer,group=group-0B75F4A309CF,id=99c85376-060f-4b3c-8973-a2d2b1dd23e6
 2022-06-15 20:00:18,122 INFO org.apache.ratis.server.impl.RoleInfo: 
99c85376-060f-4b3c-8973-a2d2b1dd23e6: shutdown 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-LeaderStateImpl 
2022-06-15 20:00:18,124 INFO org.apache.ratis.server.impl.PendingRequests: 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF-PendingRequests: 
sendNotLeaderResponses 2022-06-15 20:00:18,125 WARN 
org.apache.ratis.grpc.server.GrpcLogAppender: 
99c85376-060f-4b3c-8973-a2d2b1dd23e6@group-0B75F4A309CF->b6382f07-de2e-4986-8275-9146e73360a6-GrpcLogAppender:
 Wait interrupted by java.lang.InterruptedException 2022-06-15 20:00:18,128 
INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: current leader SCM steps 
down.{noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to