[ 
https://issues.apache.org/jira/browse/RATIS-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated RATIS-2190:
-----------------------------------
    Attachment: ozone-datanode.log.tgz

> Unexpected life cycle state in StateMachineUpdater.reload()
> -----------------------------------------------------------
>
>                 Key: RATIS-2190
>                 URL: https://issues.apache.org/jira/browse/RATIS-2190
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: ozone-datanode.log.tgz
>
>
> Saw this assertion error in a Ozone DataNode after a lot of failed 
> appendEntries:
> {noformat}
> 3bbecffad8<-e693615a-d484-4165-8446-dff08cac5978#2:FAIL-t10,INCONSISTENCY,nextIndex=1006948,followerCommit=1006947,matchIndex=-1
> 2024-11-15 00:55:48,225 INFO 
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  group-B0F5FC71627F: Taking a snapshot
> at:(t:10, i:1007792) file 
> /var/lib/hadoop-ozone/datanode/ratis/data/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_1007792
> 2024-11-15 00:55:48,227 INFO 
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  group-B0F5FC71627F: Finished taking a
> snapshot at:(t:10, i:1007792) 
> file:/var/lib/hadoop-ozone/datanode/ratis/data/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_1007792
>  took: 2 ms
> 2024-11-15 00:55:48,227 INFO 
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.statemachine.impl.SimpleStateMachineStorage:
>  Deleting old snapshot at /var/lib/hadoop-ozone/datanode/ratis/d
> ata/72068d0a-08c9-457e-87f9-b0f5fc71627f/sm/snapshot.10_957779
> 2024-11-15 00:55:48,228 INFO 
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater
> : Took a snapshot at index 1007792
> 2024-11-15 00:55:48,228 INFO 
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater
> : snapshotIndex: updateIncreasingly 997792 -> 1007792
> 2024-11-15 00:55:48,228 INFO 
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.raftlog.RaftLog:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog: 
> purge 1007
> 792
> 2024-11-15 00:55:48,296 INFO 
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker]-org.apache.ratis.server.raftlog.RaftLog:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog: 
> purgeIn
> dex: updateToMax old=-1, new=857752, updated? true
> 2024-11-15 00:55:48,471 INFO 
> [grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: receive 
> installSnapshot: 7cc563b3-14b5-4334-820b-5c3bbecffad8-
> >e693615a-d484-4165-8446-dff08cac5978#0-t10,notify:(t:10, i:857753)
> 2024-11-15 00:55:48,471 INFO 
> [grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: Received 
> notification to install snapshot at index 857753
> 2024-11-15 00:55:48,493 INFO 
> [grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: 
> notifyInstallSnapshot: nextIndex is 1008006 but the leader's f
> irst available index is 857753.
> 2024-11-15 00:55:48,494 WARN 
> [grpc-default-executor-6]-org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  Install snapshot notification received from Leader with termIndex: (t:10, 
> i:857753), terminat
> ing pipeline: group-B0F5FC71627F
> 2024-11-15 00:55:48,503 ERROR 
> [grpc-default-executor-6]-org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=72068d0a-08c9-457e-87f9-b0f5fc71627f.Reason : e
> 693615a-d484-4165-8446-dff08cac5978 closes pipeline when installSnapshot from 
> leader because leader snapshot doesn't contain any data to replay, all the 
> log entries prior to the snapshot might have been purged.So follower should 
> not
>  try to install snapshot from leader butcan close the pipeline here. It's in 
> follower state for 5171660ms
> 2024-11-15 00:55:48,504 INFO 
> [grpc-default-executor-6]-org.apache.ratis.server.impl.SnapshotInstallationHandler:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F: StateMachine 
> successfully installed snapshot index 857753. Rel
> oading the StateMachine.
> 2024-11-15 00:55:48,504 INFO 
> [grpc-default-executor-6]-org.apache.ratis.server.raftlog.RaftLog: 
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLog: 
> snapshotIndex: updateIncreasingly 857750 -> 857753
> 2024-11-15 00:55:48,504 INFO 
> [grpc-default-executor-6]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
>  
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker:
>  flushIndex: setUnconditionally
> 1008005 -> 857753
> 2024-11-15 00:55:48,504 INFO 
> [grpc-default-executor-6]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
>  
> e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-SegmentedRaftLogWorker:
>  safeCacheEvictIndex: setUncondi
> tionally 857752 -> 857753
> 2024-11-15 00:55:48,504 ERROR 
> [e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
>  e693615a-d484-4165-8446-dff08cac5978@group-B0F5FC71627F-StateMachineUpdate
> r caught a Throwable.
> java.lang.IllegalStateException
>         at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:35)
>         at 
> org.apache.ratis.server.impl.StateMachineUpdater.reload(StateMachineUpdater.java:226)
>         at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:190)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to