[ 
https://issues.apache.org/jira/browse/IGNITE-23223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-23223:
---------------------------------------
    Description: 
It looks like this:

2024-09-17 13:26:14:125 +0000 
[ERROR][%poc-tester-SERVER-192.168.208.65-id-0%JRaft-FSMCaller-Disruptor_stripe_0-0][FailureManager]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
java.lang.AssertionError: clusterId cannot be null when commands are already 
being executed by the CMG state machine
    at 
org.apache.ignite.internal.cluster.management.topology.LogicalTopologyImpl.requiredClusterId(LogicalTopologyImpl.java:133)
    at 
org.apache.ignite.internal.cluster.management.topology.LogicalTopologyImpl.putNode(LogicalTopologyImpl.java:114)
    at 
org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.completeValidation(CmgRaftGroupListener.java:257)
    at 
org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.onWriteBusy(CmgRaftGroupListener.java:173)
    at 
org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.onWrite(CmgRaftGroupListener.java:148)
    at 
org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:731)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:571)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:539)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:458)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
    at 
org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
    at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:326)
    at 
org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:283)
    at 
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
    at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
    at java.base/java.lang.Thread.run(Thread.java:829)

 

This is caused by a race: a node might start executing commands modifying 
logical topology (all of them require a clusterId) before the clusterId gets 
set.

> An NPE may happen in CMG state machine during init
> --------------------------------------------------
>
>                 Key: IGNITE-23223
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23223
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Blocker
>              Labels: ignite-3
>
> It looks like this:
> 2024-09-17 13:26:14:125 +0000 
> [ERROR][%poc-tester-SERVER-192.168.208.65-id-0%JRaft-FSMCaller-Disruptor_stripe_0-0][FailureManager]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
> java.lang.AssertionError: clusterId cannot be null when commands are already 
> being executed by the CMG state machine
>     at 
> org.apache.ignite.internal.cluster.management.topology.LogicalTopologyImpl.requiredClusterId(LogicalTopologyImpl.java:133)
>     at 
> org.apache.ignite.internal.cluster.management.topology.LogicalTopologyImpl.putNode(LogicalTopologyImpl.java:114)
>     at 
> org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.completeValidation(CmgRaftGroupListener.java:257)
>     at 
> org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.onWriteBusy(CmgRaftGroupListener.java:173)
>     at 
> org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.onWrite(CmgRaftGroupListener.java:148)
>     at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:731)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:571)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:539)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:458)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
>     at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
>     at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:326)
>     at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:283)
>     at 
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
>     at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
>     at java.base/java.lang.Thread.run(Thread.java:829)
>  
> This is caused by a race: a node might start executing commands modifying 
> logical topology (all of them require a clusterId) before the clusterId gets 
> set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to