[
https://issues.apache.org/jira/browse/CASSANDRA-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738471#comment-17738471
]
Stefan Miklosovic commented on CASSANDRA-18636:
-----------------------------------------------
I think this might help
https://github.com/apache/cassandra/pull/2457/files
> AssertionError: Unknown keyspace after dropping a keyspace
> ----------------------------------------------------------
>
> Key: CASSANDRA-18636
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18636
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ke Han
> Priority: Normal
> Attachments: system.log
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When running Cassandra cluster (3 nodes) in 3.11.15, I noticed that there is
> an ERROR message in system.log after drop keyspace command is executed.
> {code:java}
> INFO [Native-Transport-Requests-1] 2023-06-26 06:11:48,575
> MigrationManager.java:205 - Create new Keyspace:
> KeyspaceMetadata{name=uuide1b80964f06a424c8a57884564127745,
> params=KeyspaceParams{durable_writes=true,
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
> replication_factor=2}}, tables=[], views=[], functions=[], types=[]}
> INFO [Native-Transport-Requests-1] 2023-06-26 06:11:53,106
> MigrationManager.java:205 - Create new Keyspace:
> KeyspaceMetadata{name=uuidd17c2b7bfea545d0801a3c05e0e5307a,
> params=KeyspaceParams{durable_writes=true,
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
> replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
> INFO [Native-Transport-Requests-2] 2023-06-26 06:11:56,673
> MigrationManager.java:235 - Create new table:
> org.apache.cassandra.config.CFMetaData@705b690b[cfId=5a7fcf10-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=vzkzguyh,flags=[COMPOUND],params=TableParams{comment=,
> read_repair_chance=0.0, dclocal_read_repair_chance=0.1,
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000,
> default_time_to_live=0, memtable_flush_period_in_ms=0,
> min_index_interval=128, max_index_interval=2048, speculative_retry=50.0ms,
> caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'},
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
> options={min_threshold=4, max_threshold=32}},
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342,
> extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] |
> [condcduzca o ukwhjwbdljqnz
> vzkzguyh]],partitionKeyColumns=[gveohmecfdwx],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[gveohmecfdwx,
> condcduzca, vzkzguyh, o,
> ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
> INFO [MigrationStage:1] 2023-06-26 06:12:03,118 ColumnFamilyStore.java:432 -
> Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
> INFO [Native-Transport-Requests-9] 2023-06-26 06:12:04,634
> MigrationManager.java:235 - Create new table:
> org.apache.cassandra.config.CFMetaData@61f10dd7[cfId=5f3e8fa0-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=o,flags=[COMPOUND],params=TableParams{comment=,
> read_repair_chance=0.0, dclocal_read_repair_chance=0.1,
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000,
> default_time_to_live=0, memtable_flush_period_in_ms=0,
> min_index_interval=128, max_index_interval=2048, speculative_retry=ALWAYS,
> caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'},
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
> options={min_threshold=4, max_threshold=32}},
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342,
> extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] |
> [dzwqnsv gveohmecfdwx o
> ukwhjwbdljqnz]],partitionKeyColumns=[vzkzguyh],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[dzwqnsv,
> vzkzguyh, gveohmecfdwx, o,
> ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
> INFO [MigrationStage:1] 2023-06-26 06:12:11,862 ColumnFamilyStore.java:432 -
> Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.o
> INFO [Native-Transport-Requests-6] 2023-06-26 06:12:13,165
> MigrationManager.java:205 - Create new Keyspace:
> KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44,
> params=KeyspaceParams{durable_writes=true,
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
> replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
> INFO [MigrationStage:1] 2023-06-26 06:12:20,413
> MigrationCoordinator.java:531 - Sending schema pull request to /192.168.103.3
> at 1687759940413 with timeout 4287
> INFO [Native-Transport-Requests-3] 2023-06-26 06:12:21,270
> MigrationManager.java:286 - Update Keyspace
> 'uuid1602df26aef747d6af8f6d0c36fb1a44' >From
> KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44,
> params=KeyspaceParams{durable_writes=true,
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
> replication_factor=1}}, tables=[], views=[], functions=[],
> types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
> To KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44,
> params=KeyspaceParams{durable_writes=true,
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
> replication_factor=3}}, tables=[], views=[], functions=[],
> types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
> INFO [Native-Transport-Requests-1] 2023-06-26 06:12:24,646
> MigrationManager.java:235 - Create new table:
> org.apache.cassandra.config.CFMetaData@132ca1e[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
> read_repair_chance=0.0, dclocal_read_repair_chance=0.1,
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000,
> default_time_to_live=0, memtable_flush_period_in_ms=0,
> min_index_interval=128, max_index_interval=2048,
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition'
> : 'NONE'},
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
> options={min_threshold=4, max_threshold=32}},
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342,
> extensions={},
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
> |
> [xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
> jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]]
> INFO [MigrationStage:1] 2023-06-26 06:12:31,863 ColumnFamilyStore.java:432 -
> Initializing uuid1602df26aef747d6af8f6d0c36fb1a44.q
> INFO [MutationStage-5] 2023-06-26 06:12:32,571 ColumnFamilyStore.java:2250 -
> Truncating uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
> INFO [MutationStage-5] 2023-06-26 06:12:35,128 ColumnFamilyStore.java:2304 -
> Truncate of uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh is complete
> INFO [MutationStage-4] 2023-06-26 06:12:35,134 ColumnFamilyStore.java:2250 -
> Truncating uuid1602df26aef747d6af8f6d0c36fb1a44.q
> INFO [MutationStage-4] 2023-06-26 06:12:37,524 ColumnFamilyStore.java:2304 -
> Truncate of uuid1602df26aef747d6af8f6d0c36fb1a44.q is complete
> INFO [Native-Transport-Requests-4] 2023-06-26 06:12:37,529
> MigrationManager.java:362 - Drop Keyspace
> 'uuidd17c2b7bfea545d0801a3c05e0e5307a'
> INFO [Native-Transport-Requests-1] 2023-06-26 06:13:07,622
> MigrationManager.java:313 - Update table
> 'uuid1602df26aef747d6af8f6d0c36fb1a44/q' From
> org.apache.cassandra.config.CFMetaData@602b98fd[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
> read_repair_chance=0.0, dclocal_read_repair_chance=0.1,
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000,
> default_time_to_live=0, memtable_flush_period_in_ms=0,
> min_index_interval=128, max_index_interval=2048,
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition'
> : 'NONE'},
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
> options={max_threshold=32, min_threshold=4}},
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342,
> extensions={},
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
> |
> [xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
> jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]] To
> org.apache.cassandra.config.CFMetaData@cfd4a9a[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
> read_repair_chance=0.0, dclocal_read_repair_chance=0.1,
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000,
> default_time_to_live=0, memtable_flush_period_in_ms=0,
> min_index_interval=128, max_index_interval=2048,
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition'
> : 'NONE'},
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
> options={max_threshold=32, min_threshold=4}},
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342,
> extensions={},
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
> |
> []],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[jsxcfdobqbmirwxs,
> fkbelwg],droppedColumns={java.nio.HeapByteBuffer[pos=0 lim=8
> cap=8]=DroppedColumn{name=xwtypuqy, kind=REGULAR,
> type=org.apache.cassandra.db.marshal.Int32Type,
> droppedTime=1687759987622000}},triggers=[],indexes=[]]
> ERROR [OptionalTasks:1] 2023-06-26 06:13:14,070 CassandraDaemon.java:244 -
> Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.AssertionError: Unknown keyspace
> uuidd17c2b7bfea545d0801a3c05e0e5307a
> at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
> at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
> at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
> at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
> at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
> at
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
> at
> org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:72)
> at
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> at java.lang.Thread.run(Thread.java:750)
> {code}
>
> From my understanding, this is a concurrency bug. The ERROR message comes
> from *_SizeEstimatesRecorder_* thread when it tries to init and iterate the
> keyspaces to get the size information. This thread might be invoked from time
> to time, controlled by DebuggableScheduledThreadPoolExecutor.
> The error message occurs when there's a thread interleaving between
> # Thread1: *_SizeEstimatesRecorder_* thread and
> # Thread2: Thread performing {_}drop keyspace{_}. (mutation or migration
> thread)
> In short, thread1 fetches the keyspace names => thread2 updates the schema by
> removing a specific keyspace => thread1 tries to fetch the schema from the
> dropped keyspace, and error message is shown.
>
> More specifically, here's how it happens from my preliminary analysis:
> (1) Thread1 tries to iterate all the keyspaces from
> {code:java}
> public void run()
> {
> TokenMetadata metadata =
> StorageService.instance.getTokenMetadata().cloneOnlyTokenMap();
> if (!metadata.isMember(FBUtilities.getBroadcastAddress()))
> {
> logger.debug("Node is not part of the ring; not recording size
> estimates");
> return;
> } logger.trace("Recording size estimates"); for (Keyspace keyspace
> : Keyspace.nonLocalStrategy())
> {{code}
> It invokes _Schema.instance.getNonLocalStrategyKeyspaces()_ in
> _nonLocalStrategy_ function
> {code:java}
> public static Iterable<Keyspace> nonLocalStrategy()
> {
> return
> Iterables.transform(Schema.instance.getNonLocalStrategyKeyspaces(),
> keyspaceTransformer);
> }
> {code}
> getNonLocalStrategyKeyspaces returns a list of *keyspace names* from current
> schema.
> {code:java}
> /**
> * @return a collection of keyspaces that do not use LocalStrategy for
> replication
> */
> public List<String> getNonLocalStrategyKeyspaces()
> {
> return keyspaces.values().stream()
> .filter(keyspace -> keyspace.params.replication.klass !=
> LocalStrategy.class)
> .map(keyspace -> keyspace.name)
> .collect(Collectors.toList());
> } {code}
> (2) Thread2 performs the _drop keyspace_ and {_}updates the schema{_}.
> (3) Thread1: for each keyspace name, it will invoke the <init> function of
> Keyspace, when it tries to fetch the metadata from the dropped keyspace name,
> it gets null and throws and ERROR exception.
> {code:java}
> private Keyspace(String keyspaceName, boolean loadSSTables)
> {
> metadata = Schema.instance.getKSMetaData(keyspaceName);
> assert metadata != null : "Unknown keyspace " + keyspaceName;
> createReplicationStrategy(metadata); {code}
>
> This error message won't stop Cassandra from running. But I think this could
> be handled considering its ERROR level. E.g. if a user wants to perform a
> migration from 3.x to 4.x, [Cassandra migration
> doc|https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x]
> mentions that "All errors need to be handled". This error message could
> confuse users about whether the data is still in an integrated state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]