[
https://issues.apache.org/jira/browse/CASSANDRA-19794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868800#comment-17868800
]
Alex Petrov commented on CASSANDRA-19794:
-----------------------------------------
In short, the problem is that we are not removing the node from CMS when
leaving or replacing, and both leaving and replacing unregister the node, so
when we are constructing the diff, we end up with a {{null}} in it.
> NPE on Directory access during Memtable flush fails ShortPaxosSimulationTest
> ----------------------------------------------------------------------------
>
> Key: CASSANDRA-19794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19794
> Project: Cassandra
> Issue Type: Bug
> Components: Test/burn, Transactional Cluster Metadata
> Reporter: Caleb Rackliffe
> Assignee: Alex Petrov
> Priority: Normal
> Fix For: 5.x
>
>
> Run {{ShortPaxosSimulationTest}} w/ the following arguments on trunk:
> {noformat}
> PaxosSimulationRunner.main(new String[] { "run", "-n", "3..6", "-t", "1000",
> "-c", "2", "--cluster-action-limit", "2", "-s", "30", "--seed",
> "0xe0247e19a75e3bba" });
> {noformat}
> You should see a failure, starting with...
> {noformat}
> [junit-timeout] WARN [OptionalTasks:1] node5 2024-07-22 15:46:00,210
> LegacyStateListener.java:158 - Token -6148914691236517205 changing ownership
> from /127.0.0.1:7012 to /127.0.0.6:7012
> [junit-timeout] WARN [OptionalTasks:1] node6 2024-07-22 15:46:00,259
> SystemKeyspace.java:1287 - Using stored Gossip Generation 1577894856 as it is
> greater than current system time 1577894855. See CASSANDRA-3654 if you
> experience problems
> [junit-timeout] WARN [OptionalTasks:1] node6 2024-07-22 15:46:00,277
> LegacyStateListener.java:158 - Token -6148914691236517205 changing ownership
> from /127.0.0.1:7012 to /127.0.0.6:7012
> [junit-timeout] ERROR [isolatedExecutor:3] node6 2024-07-22 15:46:00,469
> ReconfigureCMS.java:184 - Could not finish adding the node to the Cluster
> Metadata Service
> [junit-timeout] java.lang.IllegalStateException: Can not commit
> transformation: "SERVER_ERROR"(class java.lang.NullPointerException).
> [junit-timeout] at
> org.apache.cassandra.tcm.ClusterMetadataService.lambda$commit$6(ClusterMetadataService.java:491)
> [junit-timeout] at
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:535)
> [junit-timeout] at
> org.apache.cassandra.tcm.ClusterMetadataService.commit(ClusterMetadataService.java:488)
> [junit-timeout] at
> org.apache.cassandra.tcm.sequences.ReconfigureCMS.executeNext(ReconfigureCMS.java:179)
> [junit-timeout] at
> org.apache.cassandra.tcm.sequences.InProgressSequences.resume(InProgressSequences.java:200)
> [junit-timeout] at
> org.apache.cassandra.tcm.sequences.InProgressSequences.finishInProgressSequences(InProgressSequences.java:72)
> [junit-timeout] at
> org.apache.cassandra.tcm.ClusterMetadataService.reconfigureCMS(ClusterMetadataService.java:372)
> [junit-timeout] at
> org.apache.cassandra.tcm.ClusterMetadataService.ensureCMSPlacement(ClusterMetadataService.java:379)
> [junit-timeout] at
> org.apache.cassandra.tcm.sequences.BootstrapAndReplace.executeNext(BootstrapAndReplace.java:274)
> [junit-timeout] at
> org.apache.cassandra.simulator.cluster.OnClusterReplace$ExecuteNextStep.lambda$new$f5e64c00$1(OnClusterReplace.java:162)
> [junit-timeout] at
> org.apache.cassandra.distributed.api.IInvokableInstance.unsafeRunOnThisThread(IInvokableInstance.java:85)
> [junit-timeout] at
> org.apache.cassandra.simulator.systems.SimulatedActionTask.lambda$asSafeRunnable$0(SimulatedActionTask.java:83)
> [junit-timeout] at
> org.apache.cassandra.simulator.systems.SimulatedActionTask$1.run(SimulatedActionTask.java:93)
> [junit-timeout] at
> org.apache.cassandra.simulator.systems.InterceptingExecutor$InterceptingPooledExecutor$WaitingThread.lambda$new$1(InterceptingExecutor.java:318)
> [junit-timeout] at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> [junit-timeout] at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> ...and underneath that...
> {noformat}
> [junit-timeout] Thread[ScheduledTasks:1,5,node3]
> [junit-timeout] java.lang.NullPointerException
> [junit-timeout] at
> org.apache.cassandra.utils.btree.AbstractBTreeMap.get(AbstractBTreeMap.java:92)
> [junit-timeout] at
> org.apache.cassandra.tcm.membership.Directory.endpoint(Directory.java:312)
> [junit-timeout] at
> org.apache.cassandra.tcm.transformations.cms.AdvanceCMSReconfiguration.executeRemove(AdvanceCMSReconfiguration.java:242)
> [junit-timeout] at
> org.apache.cassandra.tcm.transformations.cms.AdvanceCMSReconfiguration.execute(AdvanceCMSReconfiguration.java:123)
> [junit-timeout] at
> org.apache.cassandra.tcm.sequences.ReconfigureCMS.applyTo(ReconfigureCMS.java:149)
> [junit-timeout] at
> org.apache.cassandra.tcm.ClusterMetadata.writePlacementAllSettled(ClusterMetadata.java:275)
> [junit-timeout] at
> org.apache.cassandra.db.DiskBoundaryManager.getLocalRanges(DiskBoundaryManager.java:158)
> [junit-timeout] at
> org.apache.cassandra.db.DiskBoundaryManager.getDiskBoundaryValue(DiskBoundaryManager.java:121)
> [junit-timeout] at
> org.apache.cassandra.db.DiskBoundaryManager.getDiskBoundaries(DiskBoundaryManager.java:65)
> [junit-timeout] at
> org.apache.cassandra.db.ColumnFamilyStore.getDiskBoundaries(ColumnFamilyStore.java:3676)
> [junit-timeout] at
> org.apache.cassandra.db.compaction.CompactionStrategyManager.maybeReloadDiskBoundaries(CompactionStrategyManager.java:587)
> [junit-timeout] at
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:899)
> [junit-timeout] at
> org.apache.cassandra.db.lifecycle.Tracker.notify(Tracker.java:558)
> [junit-timeout] at
> org.apache.cassandra.db.lifecycle.Tracker.notifySwitched(Tracker.java:547)
> [junit-timeout] at
> org.apache.cassandra.db.lifecycle.Tracker.switchMemtable(Tracker.java:390)
> [junit-timeout] at
> org.apache.cassandra.db.ColumnFamilyStore$Flush.<init>(ColumnFamilyStore.java:1248)
> [junit-timeout] at
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable(ColumnFamilyStore.java:1074)
> [junit-timeout] at
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtableIfCurrent(ColumnFamilyStore.java:1055)
> [junit-timeout] at
> org.apache.cassandra.db.ColumnFamilyStore.signalFlushRequired(ColumnFamilyStore.java:1482)
> [junit-timeout] at
> org.apache.cassandra.db.memtable.AbstractAllocatorMemtable.flushIfPeriodExpired(AbstractAllocatorMemtable.java:240)
> [junit-timeout] at
> org.apache.cassandra.db.memtable.AbstractAllocatorMemtable$1.runMayThrow(AbstractAllocatorMemtable.java:221)
> [junit-timeout] at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:26)
> [junit-timeout] at
> org.apache.cassandra.simulator.systems.SimulatedExecution$1.call(SimulatedExecution.java:212)
> [junit-timeout] at
> org.apache.cassandra.concurrent.SyncFutureTask.run(SyncFutureTask.java:68)
> [junit-timeout] at
> org.apache.cassandra.simulator.systems.InterceptingExecutor$AbstractSingleThreadedExecutorPlus.lambda$new$0(InterceptingExecutor.java:585)
> [junit-timeout] at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> [junit-timeout] at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}
> Reverting the changes from CASSANDRA-19705 allows the test to complete
> successfully, which makes sense, as {{ensureCMSPlacement()}} shows up in the
> trace above.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]