Ke Han created CASSANDRA-18728:
----------------------------------
Summary: [Transient Bug] Incorrect ByteBuffer representation of
ColumnIdentifiers when 3.11.15 loading legacy data from 2.x
Key: CASSANDRA-18728
URL: https://issues.apache.org/jira/browse/CASSANDRA-18728
Project: Cassandra
Issue Type: Bug
Reporter: Ke Han
Attachments: data.tar.gz, system.log
h1. Description
When using Cassandra 3.11.15 to load legacy data from 2.2.10, I noticed that
the byte representation of the column identifier is incorrect.
The legacy data contain two tables, and the schema is as follows.
{code:java}
cqlsh> desc test.alpha ;CREATE TABLE test.alpha (
key text PRIMARY KEY,
foo text
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';cqlsh> DESC test.foos ;CREATE TABLE
test.foos (
key text PRIMARY KEY,
"666f6f" text
) WITH COMPACT STORAGE
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32'}
AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
CREATE INDEX idx_foo ON test.foos ("666f6f"); {code}
There exists a column in test.foo with {*}name = "666f6f"{*}, the corresponding
byte representation should be Hex(666f6f) == {*}363636663666{*}. However, when
3.11.15 loads the data and creating the column, if we check the value in
byteBuffer, the it still stores "666f6f".
{code:java}
// src/java/org/apache/cassandra/schema/SchemaKeyspace.java
public static ColumnDefinition createColumnFromRow(UntypedResultSet.Row
row, Types types)
{
String keyspace = row.getString("keyspace_name");
String table = row.getString("table_name");
ColumnDefinition.Kind kind =
ColumnDefinition.Kind.valueOf(row.getString("kind").toUpperCase()); int
position = row.getInt("position");
ClusteringOrder order =
ClusteringOrder.valueOf(row.getString("clustering_order").toUpperCase());
AbstractType<?> type = parse(keyspace, row.getString("type"), types);
if (order == ClusteringOrder.DESC)
type = ReversedType.getInstance(type);
logger.info(String.format("column_name = %s, column_name_bytes = %s" ,
row.getString("column_name"), row.getBytes("column_name_bytes").array()));
ColumnIdentifier name = new
ColumnIdentifier(row.getBytes("column_name_bytes"),
row.getString("column_name")); return new ColumnDefinition(keyspace,
table, name, type, position, kind);
} {code}
Logs
INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 -
*{color:#de350b}column_name = 666f6f, column_name_bytes = foo{color}*
It should be : +column_name_bytes = {color:#172b4d}666f6f{color}+
{code:java}
INFO [main] 2023-08-07 02:21:53,722 StorageService.java:773 - Populating token
metadata from system tables
INFO [main] 2023-08-07 02:21:53,736 StorageService.java:780 - Token metadata:
Normal Tokens:
localhost/127.0.0.1:[95610762103941981519101009083045058398]INFO [main]
2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = column1,
column_name_bytes = column1
INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name =
foo, column_name_bytes = foo
INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name =
key, column_name_bytes = key
INFO [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name =
value, column_name_bytes = value
INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name =
666f6f, column_name_bytes = foo
INFO [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name =
column1, column_name_bytes = column1{code}
h1. Reproduce Method
I have attached the data tar file, if start up Cassandra 3.11.15 with it and
inject a the log statement to print out the buffer value, we can notice that
the value is incorrect in the log.
h1. Thoughts
This is a transient bug which won't lead to exceptions or error logs. But the
incorrect byte representation might lead to some critical issues.
This bug shares the same triggering method with CASSANDRA-14468. I believe this
bug also shares the same root cause as CASSANDRA-14468. In CASSANDRA-14468, the
incorrect byte representation could lead to an upgrade exception. It was
partially fixed by avoiding the intern of ColumnIdentifier (which makes this
bug transient).
But the real root cause remains, and it's still possible to cause other
problems.\
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]