Igor created IGNITE-21639:
-----------------------------
Summary: Server after kill does not start and stuck on election
Key: IGNITE-21639
URL: https://issues.apache.org/jira/browse/IGNITE-21639
Project: Ignite
Issue Type: Improvement
Components: general, networking, platforms
Affects Versions: 3.0.0-beta1
Reporter: Igor
Attachments:
poc-tester-SERVER-192.168.1.117-id-0-2024-02-29-22-56-11-client.log.0
*Steps to reproduce:*
# Start the 3 nodes cluster on different machine each (not in docker).
# Insert about 500 000 rows across 500 tables. Replication is 3.
# Kill one node.
# Start killed node.
*Expected:*
The node is started, joined to the cluster and works normally.
Actual:
The node stucks on starting with repeating messages like this:
{code:java}
2024-02-29 23:06:21:261 +0300
[INFO][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-ElectionTimer-18][NodeImpl]
Unsuccessful election round number 128
2024-02-29 23:06:21:261 +0300
[INFO][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-ElectionTimer-18][NodeImpl]
Node <154_part_24/poc-tester-SERVER-192.168.1.117-id-0> term 3 start preVote.
2024-02-29 23:06:21:282 +0300
[ERROR][%poc-tester-SERVER-192.168.1.117-id-0%JRaft-FSMCaller-Disruptor_stripe_5-0][StripedDisruptor]
Handle disruptor event error
[name=%poc-tester-SERVER-192.168.1.117-id-0%JRaft-FSMCaller-Disruptor-,
event=org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTask@efb699b,
hasHandler=false]
java.lang.AssertionError: Safe time reordering detected
[current=112016525904248838, proposed=112016523364991002]
at
org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:169)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at
org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:159)
at
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:674)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:557)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:525)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:444)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:136)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:130)
at
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:266)
at
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:231)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
at java.base/java.lang.Thread.run(Thread.java:829){code}
[^poc-tester-SERVER-192.168.1.117-id-0-2024-02-29-22-56-11-client.log.0]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)