[
https://issues.apache.org/jira/browse/CASSANDRA-21000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053829#comment-18053829
]
Michael Semb Wever edited comment on CASSANDRA-21000 at 1/23/26 8:51 AM:
-------------------------------------------------------------------------
bq. The database state itself, however, should not forget dropped column
definitions, because this may cause problems when the column is recreated with
a different type.
There's a number of serious (data corrupting/loss/availability) edge case bugs
around this, particularly with implicitly frozen, explicitly frozen and
non-frozen collections/UDTS, different sstable formats, dropped compact
sstables, dropping columns and re-adding them, static columns, schema changes
during mixed-versions, etc. We've also seen that changing SerializationHeader
bytes without changing/bumping the sstable format causes bugs (and shouldn't be
done).
While the discussion thread above is in the right direction (e.g. system schema
vs SerializationHeader), there's so many combinations here (e.g. a new
restarted after wiping and refetching its system schema while sstables with
re-added wrongly-frozen UDTs) that need to be checked before my concerns
settle. Top of my head related: CASSANDRA-21050, CASSANDRA-16733,
CASSANDRA-20485, CASSANDRA-20394, CASSANDRA-12582, CASSANDRA-12236,
CASSANDRA-12697, CASSANDRA-12705, CASSANDRA-11988, … At quick glance I don't
see them being directly applicable (and even pre-4.0 fixed, but 3.x sstables
can still be in new clusters, or coming from backups) but it's most about the
example complexities…
was (Author: michaelsembwever):
bq. The database state itself, however, should not forget dropped column
definitions, because this may cause problems when the column is recreated with
a different type.
There's a number of serious (data corrupting/loss/availability) edge case bugs
around this, particularly with implicitly frozen, explicitly frozen and
non-frozen collections/UDTS, different sstable formats, dropped compact
stables, dropping columns and re-adding them, static columns, schema changes
during mixed-versions, etc. We've also seen that changing SerializationHeader
bytes without changing/bumping the sstable format causes bugs (and shouldn't be
done).
While the discussion thread above is in the right direction (e.g. system schema
vs SerializationHeader), there's so many combinations here (e.g. a new
restarted after wiping and refetching its system schema while sstables with
re-added wrongly-frozen UDTs) that need to be checked before my concerns
settle. Top of my head related: CASSANDRA-21050, CASSANDRA-16733,
CASSANDRA-20485, CASSANDRA-20394, CASSANDRA-12582, CASSANDRA-12236,
CASSANDRA-12697, CASSANDRA-12705, CASSANDRA-11988, … At quick glance I don't
see them being directly applicable (and even pre-4.0 fixed, but 3.x sstables
can still be in new clusters, or coming from backups) but it's most about the
example complexities…
> Deleted columns are forever part of SerializationHeader
> -------------------------------------------------------
>
> Key: CASSANDRA-21000
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21000
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Compaction
> Reporter: Cameron Zemek
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> If you delete a column and rewrite the SSTable the column is removed from the
> data, but the serialization header refers to the deleted column still. This
> means if you drop a column and rewrite sstables (eg. nodetool upgradesstables
> -a) and that column is not in use, you still can not import or load those
> SSTables into another cluster without also having to add/drop columns.
>
> {noformat}
> ~/.ccm/test/node1/data0/test $ ~/bin/cqlsh
> Connected to repairtest at 127.0.0.1:9042
> [cqlsh 6.2.0 | Cassandra 5.0.5-SNAPSHOT | CQL spec 3.4.7 | Native protocol v5]
> Use HELP for help.
> cqlsh> CREATE TABLE test.drop_test(id int primary key, message text,
> col_to_delete text);
> cqlsh> INSERT INTO test.drop_test(id, message, col_to_delete) VALUES (1,
> 'test', 'delete me');
> cqlsh> SELECT * FROM test.drop_test;
> id | col_to_delete | message
> ----+---------------+---------
> 1 | delete me | test
> (1 rows)
> ~/.ccm/test/node1/data0/test $ ccm flush
> ~/.ccm/test/node1/data0/test $ cd drop_test-7a20f690ba8611f09c6c3125f1cbdf37
> ~/.ccm/test/node1/data0/test $ ls
> nb-1-big-CompressionInfo.db nb-1-big-Digest.crc32 nb-1-big-Index.db
> nb-1-big-Summary.db
> nb-1-big-Data.db nb-1-big-Filter.db nb-1-big-Statistics.db
> nb-1-big-TOC.txt
> ~/.ccm/test/node1/data0/test $ /.ccm/repository/5.0.3/tools/bin/sstabledump
> nb-1-big-Data.db
> [
> {
> "table kind" : "REGULAR",
> "partition" : {
> "key" : [ "1" ],
> "position" : 0
> },
> "rows" : [
> {
> "type" : "row",
> "position" : 18,
> "liveness_info" : { "tstamp" : "2025-11-05T20:32:17.946616Z" },
> "cells" : [
> { "name" : "col_to_delete", "value" : "delete me" },
> { "name" : "message", "value" : "test" }
> ]
> }
> ]
> }
> ]%
> ~/.ccm/test/node1/data0/test $ ~/bin/cqlsh
> Connected to repairtest at 127.0.0.1:9042
> [cqlsh 6.2.0 | Cassandra 5.0.5-SNAPSHOT | CQL spec 3.4.7 | Native protocol v5]
> Use HELP for help.
> cqlsh> ALTER TABLE test.drop_test DROP col_to_delete;
> cqlsh> SELECT * FROM test.drop_test;
> id | message
> ----+---------
> 1 | test
> (1 rows)
> ~/.ccm/test/node1/data0/test $ ccm node1 nodetool upgradesstables -- -a test
> drop_test
> ~/.ccm/test/node1/data0/test $ ls
> nb-2-big-CompressionInfo.db nb-2-big-Digest.crc32 nb-2-big-Index.db
> nb-2-big-Summary.db
> nb-2-big-Data.db nb-2-big-Filter.db nb-2-big-Statistics.db
> nb-2-big-TOC.txt
> ~/.ccm/test/node1/data0/test $ ~/.ccm/repository/5.0.3/tools/bin/sstabledump
> nb-2-big-Data.db
> [
> {
> "table kind" : "REGULAR",
> "partition" : {
> "key" : [ "1" ],
> "position" : 0
> },
> "rows" : [
> {
> "type" : "row",
> "position" : 18,
> "liveness_info" : { "tstamp" : "2025-11-05T20:32:17.946616Z" },
> "cells" : [
> { "name" : "message", "value" : "test" }
> ]
> }
> ]
> }
> ]%
> ~/.ccm/test/node1/data0/test $
> ~/.ccm/repository/5.0.3/tools/bin/sstablemetadata nb-2-big-Data.db | grep -E
> 'StaticColumns|RegularColumns'
> StaticColumns:
> RegularColumns: col_to_delete:org.apache.cassandra.db.marshal.UTF8Type,
> message:org.apache.cassandra.db.marshal.UTF8Type{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]