[jira] [Updated] (CASSANDRA-18728) [Transient Bug] Incorrect ByteBuffer representation of ColumnIdentifiers when 3.11.15 loading legacy data from 2.x

Ke Han (Jira) Sun, 06 Aug 2023 19:55:10 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ke Han updated CASSANDRA-18728:
-------------------------------
    Bug Category: Parent values: Correctness(12982)

> [Transient Bug] Incorrect ByteBuffer representation of ColumnIdentifiers when 
> 3.11.15 loading legacy data from 2.x
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18728
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18728
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ke Han
>            Priority: Normal
>         Attachments: data.tar.gz, system.log
>
>
> h1. Bug Description
> When using Cassandra 3.11.15 to load legacy data from 2.2.10, I noticed that 
> the byte representation of the column identifier is incorrect.
> The legacy data contain two tables, and the schema is as follows.
> {code:java}
> cqlsh> desc test.alpha ;CREATE TABLE test.alpha (
>     key text PRIMARY KEY,
>     foo text
> ) WITH COMPACT STORAGE
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = 'NONE';cqlsh> DESC test.foos ;CREATE TABLE 
> test.foos (
>     key text PRIMARY KEY,
>     "666f6f" text
> ) WITH COMPACT STORAGE
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = 'NONE';
> CREATE INDEX idx_foo ON test.foos ("666f6f"); {code}
> There exists a column in test.foo with {*}name = "666f6f"{*}, the 
> corresponding byte representation should be Hex(666f6f) == 
> {*}363636663666{*}. However, when 3.11.15 loads the data and creating the 
> column, if we check the value in byteBuffer, the it still stores "666f6f". 
> {code:java}
> // src/java/org/apache/cassandra/schema/SchemaKeyspace.java
> public static ColumnDefinition createColumnFromRow(UntypedResultSet.Row row, 
> Types types)
> {
>     String keyspace = row.getString("keyspace_name");
>     String table = row.getString("table_name");    
>     ColumnDefinition.Kind kind = 
> ColumnDefinition.Kind.valueOf(row.getString("kind").toUpperCase());
>     int position = row.getInt("position");
>     ClusteringOrder order = 
> ClusteringOrder.valueOf(row.getString("clustering_order").toUpperCase());
>     AbstractType<?> type = parse(keyspace, row.getString("type"), types);
>     if (order == ClusteringOrder.DESC)
>         type = ReversedType.getInstance(type);
>     logger.info(String.format("column_name = %s, column_name_bytes = %s" , 
> row.getString("column_name"), new 
> String(row.getBytes("column_name_bytes").array(), StandardCharsets.UTF_8)));
>     ColumnIdentifier name = new 
> ColumnIdentifier(row.getBytes("column_name_bytes"), 
> row.getString("column_name"));
>     return new ColumnDefinition(keyspace, table, name, type, position, kind);
> }{code}
> h2. Logs
> INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - 
> *{color:#de350b}column_name = 666f6f, column_name_bytes = foo{color}*
> It should be : +column_name_bytes = {color:#172b4d}666f6f{color}+
> {code:java}
> INFO  [main] 2023-08-07 02:21:53,722 StorageService.java:773 - Populating 
> token metadata from system tables
> INFO  [main] 2023-08-07 02:21:53,736 StorageService.java:780 - Token 
> metadata: Normal Tokens:
> localhost/127.0.0.1:[95610762103941981519101009083045058398]INFO  [main] 
> 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = column1, 
> column_name_bytes = column1
> INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
> foo, column_name_bytes = foo
> INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
> key, column_name_bytes = key
> INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
> value, column_name_bytes = value
> INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = 
> 666f6f, column_name_bytes = foo // Incorrect!
> INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = 
> column1, column_name_bytes = column1{code}
> h1. Reproduce Method
> I have attached the data tar file, if start up Cassandra 3.11.15 with it and 
> inject a the log statement to print out the buffer value, we can notice that 
> the value is incorrect in the log.
> h1. Thoughts
> This is a transient bug which won't lead to exceptions or error logs. But the 
> incorrect byte representation might lead to some issues.
> This bug shares the same triggering method with CASSANDRA-14468. I believe 
> this bug also shares the same root cause as CASSANDRA-14468. In 
> CASSANDRA-14468, the incorrect byte representation could lead to an upgrade 
> exception. It was partially fixed by avoiding the intern of ColumnIdentifier 
> (which makes this bug transient). But the real root cause remains, and it's 
> still possible to cause other problems.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-18728) [Transient Bug] Incorrect ByteBuffer representation of ColumnIdentifiers when 3.11.15 loading legacy data from 2.x

Reply via email to