Charlemange Lasse created CASSANDRA-15298:
---------------------------------------------
Summary: Cassandra node cannot be restored using documented backup
method
Key: CASSANDRA-15298
URL: https://issues.apache.org/jira/browse/CASSANDRA-15298
Project: Cassandra
Issue Type: Bug
Reporter: Charlemange Lasse
I have a single cassandra 3.11.4 node. It contains various tables and UDFs. The
[documentation describes a method to backup this
node|https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsBackupTakesSnapshot.html]:
* use "DESCRIBE SCHEMA" in cqlsh to get the schema
* create a snapshot using nodetool
* copy the snapshot + schema to a new (completely disconnected) node
* load schema into new node
* load sstables again using nodetool
But this is a complete bogus method. It will result in errors like:
{noformat}
java.lang.RuntimeException: Unknown column deleted_column during
deserialization {noformat}
And all data in this column is now lost.
Problem is that the "DESCRIBE SCHEMA" CQL doesn't add the stuff correctly for
already deleted (but still existing columns) to the schema. It looks for
example like:
{noformat}
CREATE TABLE mykeyspace.testcf (
primary_uuid uuid,
secondary_uuid uuid,
name text,
PRIMARY KEY (main_uuid, secondary_uuid)
) WITH CLUSTERING ORDER BY (secondary_uuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
{noformat}
But it must actually look like:
{noformat}
CREATE TABLE IF NOT EXISTS mykeyspace.testcf (
primary_uuid uuid,
secondary_uuid uuid,
name text,
deleted_column boolean,
PRIMARY KEY (main_uuid, secondary_uuid)
WITH ID = a1afdd4d-b61e-4f2a-b806-57c296be3948
AND CLUSTERING ORDER BY (ap_uuid ASC)
AND bloom_filter_fp_chance = 0.01
AND dclocal_read_repair_chance = 0.1
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND min_index_interval = 128
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE'
AND comment = ''
AND caching = { 'keys': 'ALL', 'rows_per_partition': 'NONE' }
AND compaction = { 'max_threshold': '32', 'min_threshold': '4',
'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
AND compression = { 'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor' }
AND cdc = false
AND extensions = { };
ALTER TABLE mykeyspace.testcf DROP deleted_column USING TIMESTAMP
1563978151561000;
{noformat}
This was taken from the snapshot's (column family specific) schema.cql. Which
of course is not compatible with the main schema because it will only create
the tables when they don't exist (which they are because the main "DESCRIBE
SCHEMA" file already creates them) and is missing all other kind of stuff like
UDFs.
It is currently not possible (using the builtin mechanisms from cassandra
3.11.4) to migrate a keyspace from one separated server to another separated
server.
This behavior also breaks various backup systems which try to store cassandra
cluster information to an offline storage.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]