[
https://issues.apache.org/jira/browse/RATIS-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated RATIS-2192:
-----------------------------------
Attachment: ozone-datanode.2.tgz
> Lots of errors after applying RATIS-2129
> ----------------------------------------
>
> Key: RATIS-2192
> URL: https://issues.apache.org/jira/browse/RATIS-2192
> Project: Ratis
> Issue Type: Bug
> Reporter: Wei-Chiu Chuang
> Priority: Major
> Attachments: ozone-datanode.1.tgz, ozone-datanode.2.tgz,
> ozone-datanode.3.tgz
>
>
> Ok to be honest I am not sure if it's related to RATIS-2129. But I'm using a
> build that is Ratis 3.1.1 + RATIS-2129, and I am seeing all kinds of errors
> running HBase on Ozone.
> failed to take snapshot due to last applied txn not current:
> {noformat}
> 2024-11-16 00:10:31,035 INFO
> [grpc-default-executor-22]-org.apache.ratis.server.RaftServer:
> e693615a-d484-4165-8446-dff08cac5978: remove FOLLOWER
> e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1:t229,
> leader=67eefe63-0930-42d7-a364-e46fde563ff1,
> voted=67eefe63-0930-42d7-a364-e46fde563ff1,
> raftlog=Memoized:e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-SegmentedRaftLog:OPENED:c1613342:last(t:229,
> i:1613343), conf=conf: {index: 1613340,
> cur=peers:[e693615a-d484-4165-8446-dff08cac5978|10.140.146.67:9856,
> 67eefe63-0930-42d7-a364-e46fde563ff1|10.140.86.199:9856,
> 7cc563b3-14b5-4334-820b-5c3bbecffad8|10.140.20.0:9856]|listeners:[],
> old=null} RUNNING
> 2024-11-16 00:10:31,038 INFO
> [grpc-default-executor-22]-org.apache.ratis.server.RaftServer$Division:
> e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1: shutdown
> 2024-11-16 00:10:31,039 INFO
> [grpc-default-executor-22]-org.apache.ratis.util.JmxRegister: Successfully
> un-registered JMX Bean with object name
> Ratis:service=RaftServer,group=group-AF4CEBD817A1,id=e693615a-d484-4165-8446-dff08cac5978
> 2024-11-16 00:10:31,039 INFO
> [grpc-default-executor-22]-org.apache.ratis.server.impl.RoleInfo:
> e693615a-d484-4165-8446-dff08cac5978: shutdown
> e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState
> 2024-11-16 00:10:31,039 INFO
> [grpc-default-executor-22]-org.apache.ratis.server.impl.StateMachineUpdater:
> e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater:
> set stopIndex = 1613342
> 2024-11-16 00:10:31,039 INFO
> [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState]-org.apache.ratis.server.impl.FollowerState:
> e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-FollowerState was
> interrupted
> 2024-11-16 00:10:31,043 ERROR
> [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
> Failed to take snapshot for group-AF4CEBD817A1 as the stateMachine is
> unhealthy. The last applied index is at (t:216, i:1613313)
> 2024-11-16 00:10:31,043 ERROR
> [e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
> e693615a-d484-4165-8446-dff08cac5978@group-AF4CEBD817A1-StateMachineUpdater:
> Failed to take snapshot
> org.apache.ratis.protocol.exceptions.StateMachineException: Failed to take
> snapshot for group-AF4CEBD817A1 as the stateMachine is unhealthy. The last
> applied index is at (t:216, i:1613313)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:356)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:286)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:278)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Log entry not found
> {noformat}
> 2024-11-14 01:59:37,516 WARN
> [7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon]-org.apache.r
> atis.server.leader.LogAppenderDaemon:
> 7cc563b3-14b5-4334-820b-5c3bbecffad8@group-0C8C280DCAED->67eefe63-0930-42d7-a364-e46fde563ff1-GrpcLogAppender-LogAppenderDaemon
> faile
> d
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found:
> index = 3205
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
> at
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)
> at
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> HDDS-11720 seems to be related too.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)