[
https://issues.apache.org/jira/browse/IGNITE-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-26051:
---------------------------------------
Description:
ItPartitionDestructionTest#
partitionIsDestroyedOnTableDestructionOnNodeRecoveryWithColocation()
demonstrates the problem: it is enough to remove the muting annotation from the
test and then run it. The test passes, but in the log the following appears:
{noformat}
[2025-07-28T15:30:28,532][ERROR][%ipdt_pidotdonrwc_3344%JRaft-FSMCaller-Disruptor_stripe_0-0][FailureManager]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
org.apache.ignite.internal.failure.StackTraceCapturingException: Unknown error
at
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:191)
at
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:168)
at
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:866)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:578)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:544)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:462)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
at
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
at
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:330)
at
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:287)
at
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.AssertionError: No RAFT table processor found by table ID
21
at
org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.raftTableProcessor(WriteIntentSwitchCommandHandler.java:70)
at
org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.handleInternally(WriteIntentSwitchCommandHandler.java:58)
at
org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.handleInternally(WriteIntentSwitchCommandHandler.java:33)
at
org.apache.ignite.internal.partition.replicator.raft.handlers.AbstractCommandHandler.handle(AbstractCommandHandler.java:45)
at
org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.processWriteCommand(ZonePartitionRaftListener.java:228)
at
org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.lambda$onWrite$1(ZonePartitionRaftListener.java:162)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at
org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.onWrite(ZonePartitionRaftListener.java:160)
at
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:845)
... 10 more{noformat}
There appears to a race between table raft processor removal (due to table
destruction) and a write intent switch command application (that tries to
invoke that processor).
It seems that this can be easily fixed by ignoring tables for which there is no
processor. On the one hand, this seems fine as the table is already destroyed
together with all its data and not-yet-switched write intents; on the other
hand, this will lead to partial application of a write intent, so it should be
carefully considered if it IS ok to do so.
> With enabled colocation, write intent switch might fail due to table
> destruction
> --------------------------------------------------------------------------------
>
> Key: IGNITE-26051
> URL: https://issues.apache.org/jira/browse/IGNITE-26051
> Project: Ignite
> Issue Type: Bug
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Major
> Labels: MakeTeamcityGreenAgain, ignite-3
>
> ItPartitionDestructionTest#
> partitionIsDestroyedOnTableDestructionOnNodeRecoveryWithColocation()
> demonstrates the problem: it is enough to remove the muting annotation from
> the test and then run it. The test passes, but in the log the following
> appears:
>
> {noformat}
> [2025-07-28T15:30:28,532][ERROR][%ipdt_pidotdonrwc_3344%JRaft-FSMCaller-Disruptor_stripe_0-0][FailureManager]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
> org.apache.ignite.internal.failure.StackTraceCapturingException: Unknown error
> at
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:191)
> at
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:168)
> at
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:866)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:578)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:544)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:462)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:330)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:287)
> at
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: java.lang.AssertionError: No RAFT table processor found by table
> ID 21
> at
> org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.raftTableProcessor(WriteIntentSwitchCommandHandler.java:70)
> at
> org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.handleInternally(WriteIntentSwitchCommandHandler.java:58)
> at
> org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.handleInternally(WriteIntentSwitchCommandHandler.java:33)
> at
> org.apache.ignite.internal.partition.replicator.raft.handlers.AbstractCommandHandler.handle(AbstractCommandHandler.java:45)
> at
> org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.processWriteCommand(ZonePartitionRaftListener.java:228)
> at
> org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.lambda$onWrite$1(ZonePartitionRaftListener.java:162)
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at
> org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.onWrite(ZonePartitionRaftListener.java:160)
> at
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:845)
> ... 10 more{noformat}
>
> There appears to a race between table raft processor removal (due to table
> destruction) and a write intent switch command application (that tries to
> invoke that processor).
>
> It seems that this can be easily fixed by ignoring tables for which there is
> no processor. On the one hand, this seems fine as the table is already
> destroyed together with all its data and not-yet-switched write intents; on
> the other hand, this will lead to partial application of a write intent, so
> it should be carefully considered if it IS ok to do so.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)