[
https://issues.apache.org/jira/browse/RATIS-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated RATIS-2190:
-----------------------------------
Attachment: ozone-datanode.log.tgz
> Unexpected life cycle state in StateMachineUpdater.reload()
> -----------------------------------------------------------
>
> Key: RATIS-2190
> URL: https://issues.apache.org/jira/browse/RATIS-2190
> Project: Ratis
> Issue Type: Bug
> Reporter: Wei-Chiu Chuang
> Priority: Major
> Attachments: ozone-datanode.log.tgz
>
>
> Saw this assertion error in a Ozone DataNode after a lot of failed
> appendEntries:
> {noformat}
> 3bbecffad8<-e693615a-d484-4165-8446-dff08cac5978#2:FAIL-t10,INCONSISTENCY,nextIndex=1006948,followerCommit=1006947,matchIndex=-1
> 2024-11-15 00:55:48,225 INFO
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
> group-B0F5FC71627F: Taking a snapshot
> at:(t:10, i:1007792) file
> /var/lib/hadoop-ozone/datanode/ratis/data/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_1007792
> 2024-11-15 00:55:48,227 INFO
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
> group-B0F5FC71627F: Finished taking a
> snapshot at:(t:10, i:1007792)
> file:/var/lib/hadoop-ozone/datanode/ratis/data/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_1007792
> took: 2 ms
> 2024-11-15 00:55:48,227 INFO
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.statemachine.impl.SimpleStateMachineStorage:
> Deleting old snapshot at /var/lib/hadoop-ozone/datanode/ratis/d
> ata/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_957779
> 2024-11-15 00:55:48,228 INFO
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater
> : Took a snapshot at index 1007792
> 2024-11-15 00:55:48,228 INFO
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater
> : snapshotIndex: updateIncreasingly 997792 -> 1007792
> 2024-11-15 00:55:48,228 INFO
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.raftlog.RaftLog:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog:
> purge 1007
> 792
> 2024-11-15 00:55:48,296 INFO
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker]-org.apache.ratis.server.raftlog.RaftLog:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog:
> purgeIn
> dex: updateToMax old=-1, new=857752, updated? true
> 2024-11-15 00:55:48,471 INFO
> [grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: receive
> installSnapshot: 7cc563b3-14b5-4334-820b-5c3bbecffad8-
> >e693615a-d484-4165-8446-dff08cac5978#0-t10,notify:(t:10, i:857753)
> 2024-11-15 00:55:48,471 INFO
> [grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: Received
> notification to install snapshot at index 857753
> 2024-11-15 00:55:48,493 INFO
> [grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F:
> notifyInstallSnapshot: nextIndex is 1008006 but the leader's f
> irst available index is 857753.
> 2024-11-15 00:55:48,494 WARN
> [grpc-default-executor-6]-org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
> Install snapshot notification received from Leader with termIndex: (t:10,
> i:857753), terminat
> ing pipeline: group-B0F5FC71627F
> 2024-11-15 00:55:48,503 ERROR
> [grpc-default-executor-6]-org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
> pipeline Action CLOSE on pipeline
> PipelineID=72068d0a-08c9-457e-87f9-b0f5fc71627f.Reason : e
> 693615a-d484-4165-8446-dff08cac5978 closes pipeline when installSnapshot from
> leader because leader snapshot doesn't contain any data to replay, all the
> log entries prior to the snapshot might have been purged.So follower should
> not
> try to install snapshot from leader butcan close the pipeline here. It's in
> follower state for 5171660ms
> 2024-11-15 00:55:48,504 INFO
> [grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: StateMachine
> successfully installed snapshot index 857753. Rel
> oading the StateMachine.
> 2024-11-15 00:55:48,504 INFO
> [grpc-default-executor-6]-org.apache.ratis.server.raftlog.RaftLog:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog:
> snapshotIndex: updateIncreasingly 857750 -> 857753
> 2024-11-15 00:55:48,504 INFO
> [grpc-default-executor-6]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
>
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker:
> flushIndex: setUnconditionally
> 1008005 -> 857753
> 2024-11-15 00:55:48,504 INFO
> [grpc-default-executor-6]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
>
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker:
> safeCacheEvictIndex: setUncondi
> tionally 857752 -> 857753
> 2024-11-15 00:55:48,504 ERROR
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdate
> r caught a Throwable.
> java.lang.IllegalStateException
> at
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:35)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.reload(StateMachineUpdater.java:226)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:190)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)