YCozy created CASSANDRA-15758:
---------------------------------
Summary: ERROR when a disconnected Cassandra node comes back and
receives a drop/add column request
Key: CASSANDRA-15758
URL: https://issues.apache.org/jira/browse/CASSANDRA-15758
Project: Cassandra
Issue Type: Bug
Reporter: YCozy
We got the following error when we were dropping a column in the table:
{code:java}
ERROR [MigrationStage:1] 2020-04-24 00:07:54,995 SchemaKeyspace.java:1021 - No
partition columns found for table ks_name.tbl_name in system_schema.columns.
This may be due to corruption or concurrent dropping and altering of a table.
If this table is supposed to be dropped, restart cassandra with
-Dcassandra.ignore_corrupted_schema_tables=true and run the following query to
cleanup: "DELETE FROM system_schema.tables WHERE keyspace_name = 'ks_name' AND
table_name = 'tbl_name'; DELETE FROM system_schema.columns WHERE keyspace_name
= 'ks_name' AND table_name = 'tbl_name';" If the table is not supposed to be
dropped, restore system_schema.columns sstables from backups.
ERROR [MigrationStage:1] 2020-04-25 15:21:55,716 CassandraDaemon.java:228 -
Exception in thread Thread[MigrationStage:1,5,main]
org.apache.cassandra.schema.SchemaKeyspace$MissingColumns: Columns not found in
schema table for ks_name.tbl_name
at
org.apache.cassandra.schema.SchemaKeyspace.fetchColumns(SchemaKeyspace.java:1100)
~[main/:na]
at
org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:1046)
~[main/:na]
at
org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:1000)
~[main/:na]
at
org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:959)
~[main/:na]
at
org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesOnly(SchemaKeyspace.java:951)
~[main/:na]
at
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1401)
~[main/:na]
at
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1380)
~[main/:na]
at
org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:51)
~[main/:na]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[main/:na]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_242]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_242]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[na:1.8.0_242]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_242]
at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
[main/:na]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_242]
{code}
We analyzed the logs and came up with the following theory of what happened:
# We have a cluster of three nodes (C1, C2, C3).
# Right after we start all the nodes, C3 is partitioned away from the other.
As a result, neither C1 or C2 knows that C3 exists.
# User contacts C1 to create a keyspace "ks_name" and a table "tbl_name". C1
and C2 serve the requests. Since they don't know about C3, they think the
schema is consistent across the cluster. Both the keyspace and the table are
created successfully without warning.
# User tries to drop a column in the table. Now C3 reconnects and receives the
drop column request from C1 (the coordinator node). However, it does not know
about "ks_name" nor "tbl_name". So it throws the above error.
# If the user tries to add a column instead of dropping one, the same error
will occur.
Since network partition is inevitable in deployed clusters, we think Cassandra
should better handle such a scenario.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]