[
https://issues.apache.org/jira/browse/IGNITE-23223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882724#comment-17882724
]
Roman Puchkovskiy commented on IGNITE-23223:
--------------------------------------------
Thanks!
> An NPE may happen in CMG state machine during init
> --------------------------------------------------
>
> Key: IGNITE-23223
> URL: https://issues.apache.org/jira/browse/IGNITE-23223
> Project: Ignite
> Issue Type: Bug
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Blocker
> Labels: ignite-3
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> It looks like this:
> 2024-09-17 13:26:14:125 +0000
> [ERROR][%poc-tester-SERVER-192.168.208.65-id-0%JRaft-FSMCaller-Disruptor_stripe_0-0][FailureManager]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
> [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=CRITICAL_ERROR]
> java.lang.AssertionError: clusterId cannot be null when commands are already
> being executed by the CMG state machine
> at
> org.apache.ignite.internal.cluster.management.topology.LogicalTopologyImpl.requiredClusterId(LogicalTopologyImpl.java:133)
> at
> org.apache.ignite.internal.cluster.management.topology.LogicalTopologyImpl.putNode(LogicalTopologyImpl.java:114)
> at
> org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.completeValidation(CmgRaftGroupListener.java:257)
> at
> org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.onWriteBusy(CmgRaftGroupListener.java:173)
> at
> org.apache.ignite.internal.cluster.management.raft.CmgRaftGroupListener.onWrite(CmgRaftGroupListener.java:148)
> at
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:731)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:571)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:539)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:458)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:326)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:283)
> at
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
> at
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
> at java.base/java.lang.Thread.run(Thread.java:829)
>
> This is caused by a race: a node might start executing commands modifying
> logical topology (all of them require a clusterId) before the clusterId gets
> set.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)