Wei-Chiu Chuang created RATIS-2190:
--------------------------------------

             Summary: Unexpected life cycle state in 
StateMachineUpdater.reload()
                 Key: RATIS-2190
                 URL: https://issues.apache.org/jira/browse/RATIS-2190
             Project: Ratis
          Issue Type: Bug
            Reporter: Wei-Chiu Chuang
         Attachments: ozone-datanode.log.tgz

Saw this assertion error in a Ozone DataNode after a lot of failed 
appendEntries:

{noformat}


3bbecffad8<-e693615a-d484-4165-8446-dff08cac5978#2:FAIL-t10,INCONSISTENCY,nextIndex=1006948,followerCommit=1006947,matchIndex=-1
2024-11-15 00:55:48,225 INFO 
[e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 group-B0F5FC71627F: Taking a snapshot
at:(t:10, i:1007792) file 
/var/lib/hadoop-ozone/datanode/ratis/data/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_1007792
2024-11-15 00:55:48,227 INFO 
[e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
 group-B0F5FC71627F: Finished taking a
snapshot at:(t:10, i:1007792) 
file:/var/lib/hadoop-ozone/datanode/ratis/data/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_1007792
 took: 2 ms
2024-11-15 00:55:48,227 INFO 
[e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.statemachine.impl.SimpleStateMachineStorage:
 Deleting old snapshot at /var/lib/hadoop-ozone/datanode/ratis/d
ata/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_957779
2024-11-15 00:55:48,228 INFO 
[e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater
: Took a snapshot at index 1007792
2024-11-15 00:55:48,228 INFO 
[e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater
: snapshotIndex: updateIncreasingly 997792 -> 1007792
2024-11-15 00:55:48,228 INFO 
[e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.raftlog.RaftLog:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog: 
purge 1007
792
2024-11-15 00:55:48,296 INFO 
[e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker]-org.apache.ratis.server.raftlog.RaftLog:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog: 
purgeIn
dex: updateToMax old=-1, new=857752, updated? true
2024-11-15 00:55:48,471 INFO 
[grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: receive 
installSnapshot: 7cc563b3-14b5-4334-820b-5c3bbecffad8-
>e693615a-d484-4165-8446-dff08cac5978#0-t10,notify:(t:10, i:857753)
2024-11-15 00:55:48,471 INFO 
[grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: Received notification 
to install snapshot at index 857753
2024-11-15 00:55:48,493 INFO 
[grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: 
notifyInstallSnapshot: nextIndex is 1008006 but the leader's f
irst available index is 857753.
2024-11-15 00:55:48,494 WARN 
[grpc-default-executor-6]-org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 Install snapshot notification received from Leader with termIndex: (t:10, 
i:857753), terminat
ing pipeline: group-B0F5FC71627F
2024-11-15 00:55:48,503 ERROR 
[grpc-default-executor-6]-org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
 pipeline Action CLOSE on pipeline 
PipelineID=72068d0a-08c9-457e-87f9-b0f5fc71627f.Reason : e
693615a-d484-4165-8446-dff08cac5978 closes pipeline when installSnapshot from 
leader because leader snapshot doesn't contain any data to replay, all the log 
entries prior to the snapshot might have been purged.So follower should not
 try to install snapshot from leader butcan close the pipeline here. It's in 
follower state for 5171660ms
2024-11-15 00:55:48,504 INFO 
[grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: StateMachine 
successfully installed snapshot index 857753. Rel
oading the StateMachine.
2024-11-15 00:55:48,504 INFO 
[grpc-default-executor-6]-org.apache.ratis.server.raftlog.RaftLog: 
e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog: 
snapshotIndex: updateIncreasingly 857750 -> 857753
2024-11-15 00:55:48,504 INFO 
[grpc-default-executor-6]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
 
e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker: 
flushIndex: setUnconditionally
1008005 -> 857753
2024-11-15 00:55:48,504 INFO 
[grpc-default-executor-6]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
 
e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker: 
safeCacheEvictIndex: setUncondi
tionally 857752 -> 857753
2024-11-15 00:55:48,504 ERROR 
[e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
 e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdate
r caught a Throwable.
java.lang.IllegalStateException
        at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:35)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.reload(StateMachineUpdater.java:226)
        at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:190)
        at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to