Wei-Chiu Chuang created RATIS-2192:
--------------------------------------

             Summary: Lots of errors after applying RATIS-2129
                 Key: RATIS-2192
                 URL: https://issues.apache.org/jira/browse/RATIS-2192
             Project: Ratis
          Issue Type: Bug
            Reporter: Wei-Chiu Chuang


Ok to be honest I am not sure if it's related to RATIS-2129. But I'm using a 
build that is Ratis 3.1.1 + RATIS-2129, and I am seeing all kinds of errors 
running HBase on Ozone.

failed to take snapshot due to last applied txn not current:
{noformat}
2024-11-16 00:10:31,035 INFO 
[grpc-default-executor-22]-org.apache.ratis.server.RaftServer: 
e693615a-d484-4165-8446-dff08cac5978: remove  FOLLOWER 
e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1:t229, 
leader=67eefe63-0930-42d7-a364-e46fde563ff1, 
voted=67eefe63-0930-42d7-a364-e46fde563ff1, 
raftlog=Memoized:e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-SegmentedRaftLog:OPENED:c1613342:last(t:229,
 i:1613343), conf=conf: {index: 1613340, 
cur=peers:[e693615a-d484-4165-8446-dff08cac5978|10.140.146.67:9856, 
67eefe63-0930-42d7-a364-e46fde563ff1|10.140.86.199:9856, 
7cc563b3-14b5-4334-820b-5c3bbecffad8|10.140.20.0:9856]|listeners:[], old=null} 
RUNNING
2024-11-16 00:10:31,038 INFO 
[grpc-default-executor-22]-org.apache.ratis.server.RaftServer$Division: 
e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1: shutdown
2024-11-16 00:10:31,039 INFO 
[grpc-default-executor-22]-org.apache.ratis.util.JmxRegister: Successfully 
un-registered JMX Bean with object name 
Ratis:service=RaftServer,group=group-AF4CEBD817A1,id=e693615a-d484-4165-8446-dff08cac5978
2024-11-16 00:10:31,039 INFO 
[grpc-default-executor-22]-org.apache.ratis.server.impl.RoleInfo: 
e693615a-d484-4165-8446-dff08cac5978: shutdown 
e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState
2024-11-16 00:10:31,039 INFO 
[grpc-default-executor-22]-org.apache.ratis.server.impl.StateMachineUpdater: 
e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater: 
set stopIndex = 1613342
2024-11-16 00:10:31,039 INFO 
[e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState]-org.apache.ratis.server.impl.FollowerState:
 e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState was 
interrupted
2024-11-16 00:10:31,043 ERROR 
[e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 Failed to take snapshot  for group-AF4CEBD817A1 as the stateMachine is 
unhealthy. The last applied index is at (t:216, i:1613313)
2024-11-16 00:10:31,043 ERROR 
[e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
 e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater: 
Failed to take snapshot
org.apache.ratis.protocol.exceptions.StateMachineException: Failed to take 
snapshot  for group-AF4CEBD817A1 as the stateMachine is unhealthy. The last 
applied index is at (t:216, i:1613313)
        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:356)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:286)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:278)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194)
        at java.lang.Thread.run(Thread.java:748)
{noformat}
Log entry not found
{noformat}
2024-11-14 01:59:37,516 WARN 
[7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon]-org.apache.r
atis.server.leader.LogAppenderDaemon: 
7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon
 faile
d
org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index 
= 3205
        at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
        at 
org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
        at 
org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)
        at 
org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
        at java.lang.Thread.run(Thread.java:748)
{noformat}
HDDS-11720 seems to be related too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to