[ 
https://issues.apache.org/jira/browse/IGNITE-26051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-26051:
---------------------------------------
    Description: 
ItPartitionDestructionTest#
partitionIsDestroyedOnTableDestructionOnNodeRecoveryWithColocation() 
demonstrates the problem: it is enough to remove the muting annotation from the 
test and then run it. The test passes, but in the log the following appears:
 
{noformat}
[2025-07-28T15:30:28,532][ERROR][%ipdt_pidotdonrwc_3344%JRaft-FSMCaller-Disruptor_stripe_0-0][FailureManager]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
org.apache.ignite.internal.failure.StackTraceCapturingException: Unknown error
at 
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:191)
at 
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:168)
at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:866)
at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:578)
at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:544)
at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:462)
at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:330)
at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:287)
at 
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.AssertionError: No RAFT table processor found by table ID 
21
at 
org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.raftTableProcessor(WriteIntentSwitchCommandHandler.java:70)
at 
org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.handleInternally(WriteIntentSwitchCommandHandler.java:58)
at 
org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.handleInternally(WriteIntentSwitchCommandHandler.java:33)
at 
org.apache.ignite.internal.partition.replicator.raft.handlers.AbstractCommandHandler.handle(AbstractCommandHandler.java:45)
at 
org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.processWriteCommand(ZonePartitionRaftListener.java:228)
at 
org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.lambda$onWrite$1(ZonePartitionRaftListener.java:162)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at 
org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.onWrite(ZonePartitionRaftListener.java:160)
at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:845)
... 10 more{noformat}
 
There appears to a race between table raft processor removal (due to table 
destruction) and a write intent switch command application (that tries to 
invoke that processor).
 
It seems that this can be easily fixed by ignoring tables for which there is no 
processor. On the one hand, this seems fine as the table is already destroyed 
together with all its data and not-yet-switched write intents; on the other 
hand, this will lead to partial application of a write intent, so it should be 
carefully considered if it IS ok to do so.

> With enabled colocation, write intent switch might fail due to table 
> destruction
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-26051
>                 URL: https://issues.apache.org/jira/browse/IGNITE-26051
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain, ignite-3
>
> ItPartitionDestructionTest#
> partitionIsDestroyedOnTableDestructionOnNodeRecoveryWithColocation() 
> demonstrates the problem: it is enough to remove the muting annotation from 
> the test and then run it. The test passes, but in the log the following 
> appears:
>  
> {noformat}
> [2025-07-28T15:30:28,532][ERROR][%ipdt_pidotdonrwc_3344%JRaft-FSMCaller-Disruptor_stripe_0-0][FailureManager]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
> org.apache.ignite.internal.failure.StackTraceCapturingException: Unknown error
> at 
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:191)
> at 
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:168)
> at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:866)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:578)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:544)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:462)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
> at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:330)
> at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:287)
> at 
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
> at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: java.lang.AssertionError: No RAFT table processor found by table 
> ID 21
> at 
> org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.raftTableProcessor(WriteIntentSwitchCommandHandler.java:70)
> at 
> org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.handleInternally(WriteIntentSwitchCommandHandler.java:58)
> at 
> org.apache.ignite.internal.partition.replicator.raft.handlers.WriteIntentSwitchCommandHandler.handleInternally(WriteIntentSwitchCommandHandler.java:33)
> at 
> org.apache.ignite.internal.partition.replicator.raft.handlers.AbstractCommandHandler.handle(AbstractCommandHandler.java:45)
> at 
> org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.processWriteCommand(ZonePartitionRaftListener.java:228)
> at 
> org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.lambda$onWrite$1(ZonePartitionRaftListener.java:162)
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at 
> org.apache.ignite.internal.partition.replicator.raft.ZonePartitionRaftListener.onWrite(ZonePartitionRaftListener.java:160)
> at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:845)
> ... 10 more{noformat}
>  
> There appears to a race between table raft processor removal (due to table 
> destruction) and a write intent switch command application (that tries to 
> invoke that processor).
>  
> It seems that this can be easily fixed by ignoring tables for which there is 
> no processor. On the one hand, this seems fine as the table is already 
> destroyed together with all its data and not-yet-switched write intents; on 
> the other hand, this will lead to partial application of a write intent, so 
> it should be carefully considered if it IS ok to do so.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to