[
https://issues.apache.org/jira/browse/HDDS-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-11785:
-------------------------------
Target Version/s: 2.0.0, 1.4.2
> DataNode aborts state machine because ContainerStateMachine does not know
> follower's next index
> -----------------------------------------------------------------------------------------------
>
> Key: HDDS-11785
> URL: https://issues.apache.org/jira/browse/HDDS-11785
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Critical
> Labels: pull-request-available
> Fix For: 2.0.0
>
>
> We have a DataNode that encountered an exception removing state machine data.
> After that, the state machine was closed and DataNode had no available
> pipelines and became idle.
>
> Eventually, SCM couldn't find any healthy DataNode and pipeline and couldn't
> get out of safe mode after restart.
>
> cc: [~szetszwo] seems it could only happen if hdds.datanode.
> wait.on.all.followers = true.
> {noformat}
> 2024-11-20 15:41:16,232 ERROR
> [09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-StateMachineUpdater]-org.apache.ratis.server.impl.StateMachineUpdater:
> 09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-StateMachineUpdater
> caught a Throwable.
> java.lang.RuntimeException: java.util.NoSuchElementException: No value present
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.removeStateMachineDataIfNeeded(ContainerStateMachine.java:880)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.notifyTermIndexUpdated(ContainerStateMachine.java:847)
> at
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1755)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:242)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:184)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.NoSuchElementException: No value present
> at java.util.OptionalLong.getAsLong(OptionalLong.java:118)
> at
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.removeStateMachineDataIfNeeded(ContainerStateMachine.java:874)
> ... 5 more
> 2024-11-20 15:41:16,233 INFO
> [09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-StateMachineUpdater]-org.apache.ratis.server.RaftServer$Division:
> 09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D: shutdown
> 2024-11-20 15:41:16,233 INFO
> [09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-StateMachineUpdater]-org.apache.ratis.util.JmxRegister:
> Successfully un-registered JMX Bean with object name
> Ratis:service=RaftServer,group=group-B6AD8655BA5D,id=09eca63b-ce87-43a5-ae29-a373c6c8791e
> 2024-11-20 15:41:16,233 INFO
> [09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-StateMachineUpdater]-org.apache.ratis.server.impl.RoleInfo:
> 09eca63b-ce87-43a5-ae29-a373c6c8791e: shutdown
> 09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-LeaderStateImpl
> 2024-11-20 15:41:16,237 INFO
> [09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-StateMachineUpdater]-org.apache.ratis.server.impl.PendingRequests:
> 09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-PendingRequests:
> sendNotLeaderResponses
> 2024-11-20 15:41:16,239 INFO
> [09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-StateMachineUpdater]-org.apache.ratis.server.RaftServer$Division:
> 09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D: closes. applyIndex:
> 188
> 2024-11-20 15:41:17,232 INFO
> [09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-StateMachineUpdater]-org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
>
> 09eca63b-ce87-43a5-ae29-a373c6c8791e@group-B6AD8655BA5D-SegmentedRaftLogWorker
> close() {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]