[
https://issues.apache.org/jira/browse/CASSANDRA-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ke Han updated CASSANDRA-18728:
-------------------------------
Bug Category: Parent values: Correctness(12982)
> [Transient Bug] Incorrect ByteBuffer representation of ColumnIdentifiers when
> 3.11.15 loading legacy data from 2.x
> ------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18728
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18728
> Project: Cassandra
> Issue Type: Bug
> Reporter: Ke Han
> Priority: Normal
> Attachments: data.tar.gz, system.log
>
>
> h1. Bug Description
> When using Cassandra 3.11.15 to load legacy data from 2.2.10, I noticed that
> the byte representation of the column identifier is incorrect.
> The legacy data contain two tables, and the schema is as follows.
> {code:java}
> cqlsh> desc test.alpha ;CREATE TABLE test.alpha (
> key text PRIMARY KEY,
> foo text
> ) WITH COMPACT STORAGE
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';cqlsh> DESC test.foos ;CREATE TABLE
> test.foos (
> key text PRIMARY KEY,
> "666f6f" text
> ) WITH COMPACT STORAGE
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> CREATE INDEX idx_foo ON test.foos ("666f6f"); {code}
> There exists a column in test.foo with {*}name = "666f6f"{*}, the
> corresponding byte representation should be Hex(666f6f) ==
> {*}363636663666{*}. However, when 3.11.15 loads the data and creating the
> column, if we check the value in byteBuffer, the it still stores "666f6f".
> {code:java}
> // src/java/org/apache/cassandra/schema/SchemaKeyspace.java
> public static ColumnDefinition createColumnFromRow(UntypedResultSet.Row row,
> Types types)
> {
> String keyspace = row.getString("keyspace_name");
> String table = row.getString("table_name");
> ColumnDefinition.Kind kind =
> ColumnDefinition.Kind.valueOf(row.getString("kind").toUpperCase());
> int position = row.getInt("position");
> ClusteringOrder order =
> ClusteringOrder.valueOf(row.getString("clustering_order").toUpperCase());
> AbstractType<?> type = parse(keyspace, row.getString("type"), types);
> if (order == ClusteringOrder.DESC)
> type = ReversedType.getInstance(type);
> logger.info(String.format("column_name = %s, column_name_bytes = %s" ,
> row.getString("column_name"), new
> String(row.getBytes("column_name_bytes").array(), StandardCharsets.UTF_8)));
> ColumnIdentifier name = new
> ColumnIdentifier(row.getBytes("column_name_bytes"),
> row.getString("column_name"));
> return new ColumnDefinition(keyspace, table, name, type, position, kind);
> }{code}
> h2. Logs
> INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 -
> *{color:#de350b}column_name = 666f6f, column_name_bytes = foo{color}*
> It should be : +column_name_bytes = {color:#172b4d}666f6f{color}+
> {code:java}
> INFO [main] 2023-08-07 02:21:53,722 StorageService.java:773 - Populating
> token metadata from system tables
> INFO [main] 2023-08-07 02:21:53,736 StorageService.java:780 - Token
> metadata: Normal Tokens:
> localhost/127.0.0.1:[95610762103941981519101009083045058398]INFO [main]
> 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = column1,
> column_name_bytes = column1
> INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name =
> foo, column_name_bytes = foo
> INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name =
> key, column_name_bytes = key
> INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name =
> value, column_name_bytes = value
> INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name =
> 666f6f, column_name_bytes = foo // Incorrect!
> INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name =
> column1, column_name_bytes = column1{code}
> h1. Reproduce Method
> I have attached the data tar file, if start up Cassandra 3.11.15 with it and
> inject a the log statement to print out the buffer value, we can notice that
> the value is incorrect in the log.
> h1. Thoughts
> This is a transient bug which won't lead to exceptions or error logs. But the
> incorrect byte representation might lead to some issues.
> This bug shares the same triggering method with CASSANDRA-14468. I believe
> this bug also shares the same root cause as CASSANDRA-14468. In
> CASSANDRA-14468, the incorrect byte representation could lead to an upgrade
> exception. It was partially fixed by avoiding the intern of ColumnIdentifier
> (which makes this bug transient). But the real root cause remains, and it's
> still possible to cause other problems.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]