[jira] [Updated] (CASSANDRA-18728) [Transient Bug] Incorrect ByteBuffer representation of ColumnIdentifiers when 3.11.16 loading legacy data from 2.x

Ke Han (Jira) Wed, 04 Oct 2023 21:22:29 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ke Han updated CASSANDRA-18728:
-------------------------------
    Description: 
 
h1. Bug Description

When using Cassandra 3.11.16 to load legacy data from 2.2.10, I noticed that 
the byte representation of the column identifier is incorrect.
The legacy data contain two tables, and the schema is as follows.
{code:java}
cqlsh> desc test.alpha ;
CREATE TABLE test.alpha (
    key text PRIMARY KEY,
    foo text
) WITH COMPACT STORAGE
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = 'NONE';
cqlsh> DESC test.foos ;
CREATE TABLE test.foos (
    key text PRIMARY KEY,
    "666f6f" text
) WITH COMPACT STORAGE
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = 'NONE';
CREATE INDEX idx_foo ON test.foos ("666f6f"); {code}
There exists a column in test.foo with {*}name = "666f6f"{*}, the corresponding 
byte representation should be Hex(666f6f) == {*}363636663666{*}. However, when 
3.11.15 loads the data and creating the column, if we check the value in 
byteBuffer, the it still stores "666f6f". 
{code:java}
// src/java/org/apache/cassandra/schema/SchemaKeyspace.java
public static ColumnDefinition createColumnFromRow(UntypedResultSet.Row row, 
Types types)
{
    String keyspace = row.getString("keyspace_name");
    String table = row.getString("table_name");    
    ColumnDefinition.Kind kind = 
ColumnDefinition.Kind.valueOf(row.getString("kind").toUpperCase());
    int position = row.getInt("position");
    ClusteringOrder order = 
ClusteringOrder.valueOf(row.getString("clustering_order").toUpperCase());
    AbstractType<?> type = parse(keyspace, row.getString("type"), types);
    if (order == ClusteringOrder.DESC)
        type = ReversedType.getInstance(type);
    // Injected log to check byteBuffer value
    logger.info(String.format("column_name = %s, column_name_bytes = %s" , 
row.getString("column_name"), new 
String(row.getBytes("column_name_bytes").array(), StandardCharsets.UTF_8)));
    ColumnIdentifier name = new 
ColumnIdentifier(row.getBytes("column_name_bytes"), 
row.getString("column_name"));
    return new ColumnDefinition(keyspace, table, name, type, position, kind);
}{code}
h2. Logs

INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - 
*{color:#de350b}column_name = 666f6f, column_name_bytes = foo{color}*

It should be : +column_name_bytes = {color:#172b4d}666f6f{color}+
{code:java}
INFO  [main] 2023-08-07 02:21:53,722 StorageService.java:773 - Populating token 
metadata from system tables
INFO  [main] 2023-08-07 02:21:53,736 StorageService.java:780 - Token metadata: 
Normal Tokens:
localhost/127.0.0.1:[95610762103941981519101009083045058398]INFO  [main] 
2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = column1, 
column_name_bytes = column1
INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
foo, column_name_bytes = foo
INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
key, column_name_bytes = key
INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
value, column_name_bytes = value
INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = 
666f6f, column_name_bytes = foo // Incorrect!
INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = 
column1, column_name_bytes = column1{code}
h1. Reproduce Method
h2. Method1: load attached data file

I have attached the data tar file, if start up Cassandra 3.11.16 with it and 
inject a the log statement to print out the buffer value, we can notice that 
the value is incorrect in the log.
h2. Method2: Generate data from the old version (2.1.19)

Start up Cassandra 2.1.19 version, using bin/cassandra-cli to construct the 
following data
{code:java}
create keyspace test with strategy_options = {replication_factor:1} and 
placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';

use test;

create column family alpha 
with column_type = 'Standard' 
and comparator = 'UTF8Type' 
and key_validation_class = 'UTF8Type' 
and column_metadata = [{column_name: 'foo', validation_class: 'UTF8Type'}];

create column family foos 
with column_type = 'Standard' 
and comparator = 'BytesType'
and key_validation_class = 'UTF8Type' 
and column_metadata = [{column_name: '666f6f', validation_class: 'UTF8Type'}];
{code}
Then load the data using 3.0.16 with the log statements injected, and you will 
encounter the logs mentioned above.
h1. Thoughts

This is a transient bug which won't lead to exceptions or error logs. But the 
incorrect byte representation might lead to some issues.

This bug shares the same triggering method with CASSANDRA-14468. I believe this 
bug also shares the same root cause as CASSANDRA-14468. In CASSANDRA-14468, the 
incorrect byte representation could lead to an upgrade exception. It was 
partially fixed by avoiding the intern of ColumnIdentifier (which makes this 
bug transient). But the real root cause remains, and it's still possible to 
cause other problems.
h1. Root Cause

== TL, DR ==

The new version (3.11.x) uses the *comparator* of the table to create 
ColumnIdentifier. If the old version table comparator is {*}"BytesType"{*}, the 
new version assumes that the old regular column name is already in bytes format 
and thus [it directly puts the string in the 
ByteBuffer|https://github.com/apache/cassandra/blob/058621a446d1b128c429bc5a40b67c5158524146/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java#L744].

This generates a ColumnIdentifier whose text and bytes are inconsistent
 * ColumnIdentifier: {text = "666f6f", bytes = {*}"666f6f"{*}}.
 * The correct ColumnIdentify should be {text = "666f6f", bytes = 
{*}"363636663666"{*}}.

== Full Version ==

In more detail, this is how it happens

1. In 
[code|https://github.com/apache/cassandra/blob/058621a446d1b128c429bc5a40b67c5158524146/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java#L744],
 it tries to intern the ColumnIdentifer using the comparator. The comparator is 
BytesType and column name is "666f6f".

 
{code:java}
ColumnIdentifier.getInterned(comparator.fromString(row.getString("column_name")),
 comparator);
{code}
2. comparator.fromString(row.getString("column_name") directly returns a 
ByteBuffer containing {*}"666f6f"{*}. The code below directly assumes that the 
source is in bytes format.
{code:java}
// BytesType.java
public ByteBuffer fromString(String source)
{
    try
    {
        return ByteBuffer.wrap(Hex.hexToBytes(source));
    }
    catch (NumberFormatException e)
    {
        logger.info("running into MarshalException");
        throw new MarshalException(String.format("cannot parse '%s' as hex 
bytes", source), e);
    }
} {code}
 

 

3. ColumnIdentifier.getInterned uses the returned ByteBuffer to create a new 
ColumnIdentifier object.

 
{code:java}
text = "666f6f"
bytes = "666f6f"{code}
 
h1. Fix

This can be fixed in a simple way. If the comparator type is BytesType, we 
shouldn't use comparator to get the ByteBuffer. Instead, consider it as String 
format and directly use ByteBufferUtil.bytes to get the bytes.

Then the generated column identifier will be \{text = "666f6f", bytes = 
"363636663666"}

The patch is here.

 

 

  was:
h1. Bug Description

When using Cassandra 3.11.16 to load legacy data from 2.2.10, I noticed that 
the byte representation of the column identifier is incorrect.
The legacy data contain two tables, and the schema is as follows.
{code:java}
cqlsh> desc test.alpha ;
CREATE TABLE test.alpha (
    key text PRIMARY KEY,
    foo text
) WITH COMPACT STORAGE
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = 'NONE';
cqlsh> DESC test.foos ;
CREATE TABLE test.foos (
    key text PRIMARY KEY,
    "666f6f" text
) WITH COMPACT STORAGE
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = 'NONE';
CREATE INDEX idx_foo ON test.foos ("666f6f"); {code}
There exists a column in test.foo with {*}name = "666f6f"{*}, the corresponding 
byte representation should be Hex(666f6f) == {*}363636663666{*}. However, when 
3.11.15 loads the data and creating the column, if we check the value in 
byteBuffer, the it still stores "666f6f". 
{code:java}
// src/java/org/apache/cassandra/schema/SchemaKeyspace.java
public static ColumnDefinition createColumnFromRow(UntypedResultSet.Row row, 
Types types)
{
    String keyspace = row.getString("keyspace_name");
    String table = row.getString("table_name");    
    ColumnDefinition.Kind kind = 
ColumnDefinition.Kind.valueOf(row.getString("kind").toUpperCase());
    int position = row.getInt("position");
    ClusteringOrder order = 
ClusteringOrder.valueOf(row.getString("clustering_order").toUpperCase());
    AbstractType<?> type = parse(keyspace, row.getString("type"), types);
    if (order == ClusteringOrder.DESC)
        type = ReversedType.getInstance(type);
    // Injected log to check byteBuffer value
    logger.info(String.format("column_name = %s, column_name_bytes = %s" , 
row.getString("column_name"), new 
String(row.getBytes("column_name_bytes").array(), StandardCharsets.UTF_8)));
    ColumnIdentifier name = new 
ColumnIdentifier(row.getBytes("column_name_bytes"), 
row.getString("column_name"));
    return new ColumnDefinition(keyspace, table, name, type, position, kind);
}{code}
h2. Logs

INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - 
*{color:#de350b}column_name = 666f6f, column_name_bytes = foo{color}*

It should be : +column_name_bytes = {color:#172b4d}666f6f{color}+
{code:java}
INFO  [main] 2023-08-07 02:21:53,722 StorageService.java:773 - Populating token 
metadata from system tables
INFO  [main] 2023-08-07 02:21:53,736 StorageService.java:780 - Token metadata: 
Normal Tokens:
localhost/127.0.0.1:[95610762103941981519101009083045058398]INFO  [main] 
2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = column1, 
column_name_bytes = column1
INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
foo, column_name_bytes = foo
INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
key, column_name_bytes = key
INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
value, column_name_bytes = value
INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = 
666f6f, column_name_bytes = foo // Incorrect!
INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = 
column1, column_name_bytes = column1{code}
h1. Reproduce Method
h2. Method1: load attached data file

I have attached the data tar file, if start up Cassandra 3.11.16 with it and 
inject a the log statement to print out the buffer value, we can notice that 
the value is incorrect in the log.
h2. Method2: Generate data from the old version (2.1.19)

Start up Cassandra 2.1.19 version, using bin/cassandra-cli to construct the 
following data
{code:java}
create keyspace test with strategy_options = {replication_factor:1} and 
placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';

use test;

create column family alpha 
with column_type = 'Standard' 
and comparator = 'UTF8Type' 
and key_validation_class = 'UTF8Type' 
and column_metadata = [{column_name: 'foo', validation_class: 'UTF8Type'}];

create column family foos 
with column_type = 'Standard' 
and comparator = 'BytesType'
and key_validation_class = 'UTF8Type' 
and column_metadata = [{column_name: '666f6f', validation_class: 'UTF8Type'}];
{code}
Then load the data using 3.0.16 with the log statements injected, and you will 
encounter the logs mentioned above.
h1. Thoughts

This is a transient bug which won't lead to exceptions or error logs. But the 
incorrect byte representation might lead to some issues.

This bug shares the same triggering method with CASSANDRA-14468. I believe this 
bug also shares the same root cause as CASSANDRA-14468. In CASSANDRA-14468, the 
incorrect byte representation could lead to an upgrade exception. It was 
partially fixed by avoiding the intern of ColumnIdentifier (which makes this 
bug transient). But the real root cause remains, and it's still possible to 
cause other problems.
h1. Root Cause

== Short version ==

The root cause for this bug is that the new version (3.11.x) uses the 
*comparator* of the table to intern ColumnIdentifier. If the old version table 
comparator is "BytesType", then the new version thinks the old version column 
name is also bytes type. 

== Full Explanation ==

 

 

 


> [Transient Bug] Incorrect ByteBuffer representation of ColumnIdentifiers when 
> 3.11.16 loading legacy data from 2.x
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18728
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18728
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Ke Han
>            Priority: Normal
>             Fix For: 3.11.x
>
>         Attachments: data.tar.gz, system.log
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> h1. Bug Description
> When using Cassandra 3.11.16 to load legacy data from 2.2.10, I noticed that 
> the byte representation of the column identifier is incorrect.
> The legacy data contain two tables, and the schema is as follows.
> {code:java}
> cqlsh> desc test.alpha ;
> CREATE TABLE test.alpha (
>     key text PRIMARY KEY,
>     foo text
> ) WITH COMPACT STORAGE
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = 'NONE';
> cqlsh> DESC test.foos ;
> CREATE TABLE test.foos (
>     key text PRIMARY KEY,
>     "666f6f" text
> ) WITH COMPACT STORAGE
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = 'NONE';
> CREATE INDEX idx_foo ON test.foos ("666f6f"); {code}
> There exists a column in test.foo with {*}name = "666f6f"{*}, the 
> corresponding byte representation should be Hex(666f6f) == 
> {*}363636663666{*}. However, when 3.11.15 loads the data and creating the 
> column, if we check the value in byteBuffer, the it still stores "666f6f". 
> {code:java}
> // src/java/org/apache/cassandra/schema/SchemaKeyspace.java
> public static ColumnDefinition createColumnFromRow(UntypedResultSet.Row row, 
> Types types)
> {
>     String keyspace = row.getString("keyspace_name");
>     String table = row.getString("table_name");    
>     ColumnDefinition.Kind kind = 
> ColumnDefinition.Kind.valueOf(row.getString("kind").toUpperCase());
>     int position = row.getInt("position");
>     ClusteringOrder order = 
> ClusteringOrder.valueOf(row.getString("clustering_order").toUpperCase());
>     AbstractType<?> type = parse(keyspace, row.getString("type"), types);
>     if (order == ClusteringOrder.DESC)
>         type = ReversedType.getInstance(type);
>     // Injected log to check byteBuffer value
>     logger.info(String.format("column_name = %s, column_name_bytes = %s" , 
> row.getString("column_name"), new 
> String(row.getBytes("column_name_bytes").array(), StandardCharsets.UTF_8)));
>     ColumnIdentifier name = new 
> ColumnIdentifier(row.getBytes("column_name_bytes"), 
> row.getString("column_name"));
>     return new ColumnDefinition(keyspace, table, name, type, position, kind);
> }{code}
> h2. Logs
> INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - 
> *{color:#de350b}column_name = 666f6f, column_name_bytes = foo{color}*
> It should be : +column_name_bytes = {color:#172b4d}666f6f{color}+
> {code:java}
> INFO  [main] 2023-08-07 02:21:53,722 StorageService.java:773 - Populating 
> token metadata from system tables
> INFO  [main] 2023-08-07 02:21:53,736 StorageService.java:780 - Token 
> metadata: Normal Tokens:
> localhost/127.0.0.1:[95610762103941981519101009083045058398]INFO  [main] 
> 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = column1, 
> column_name_bytes = column1
> INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
> foo, column_name_bytes = foo
> INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
> key, column_name_bytes = key
> INFO  [main] 2023-08-07 02:21:53,756 SchemaKeyspace.java:1136 - column_name = 
> value, column_name_bytes = value
> INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = 
> 666f6f, column_name_bytes = foo // Incorrect!
> INFO  [main] 2023-08-07 02:21:53,762 SchemaKeyspace.java:1136 - column_name = 
> column1, column_name_bytes = column1{code}
> h1. Reproduce Method
> h2. Method1: load attached data file
> I have attached the data tar file, if start up Cassandra 3.11.16 with it and 
> inject a the log statement to print out the buffer value, we can notice that 
> the value is incorrect in the log.
> h2. Method2: Generate data from the old version (2.1.19)
> Start up Cassandra 2.1.19 version, using bin/cassandra-cli to construct the 
> following data
> {code:java}
> create keyspace test with strategy_options = {replication_factor:1} and 
> placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
> use test;
> create column family alpha 
> with column_type = 'Standard' 
> and comparator = 'UTF8Type' 
> and key_validation_class = 'UTF8Type' 
> and column_metadata = [{column_name: 'foo', validation_class: 'UTF8Type'}];
> create column family foos 
> with column_type = 'Standard' 
> and comparator = 'BytesType'
> and key_validation_class = 'UTF8Type' 
> and column_metadata = [{column_name: '666f6f', validation_class: 'UTF8Type'}];
> {code}
> Then load the data using 3.0.16 with the log statements injected, and you 
> will encounter the logs mentioned above.
> h1. Thoughts
> This is a transient bug which won't lead to exceptions or error logs. But the 
> incorrect byte representation might lead to some issues.
> This bug shares the same triggering method with CASSANDRA-14468. I believe 
> this bug also shares the same root cause as CASSANDRA-14468. In 
> CASSANDRA-14468, the incorrect byte representation could lead to an upgrade 
> exception. It was partially fixed by avoiding the intern of ColumnIdentifier 
> (which makes this bug transient). But the real root cause remains, and it's 
> still possible to cause other problems.
> h1. Root Cause
> == TL, DR ==
> The new version (3.11.x) uses the *comparator* of the table to create 
> ColumnIdentifier. If the old version table comparator is {*}"BytesType"{*}, 
> the new version assumes that the old regular column name is already in bytes 
> format and thus [it directly puts the string in the 
> ByteBuffer|https://github.com/apache/cassandra/blob/058621a446d1b128c429bc5a40b67c5158524146/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java#L744].
> This generates a ColumnIdentifier whose text and bytes are inconsistent
>  * ColumnIdentifier: {text = "666f6f", bytes = {*}"666f6f"{*}}.
>  * The correct ColumnIdentify should be {text = "666f6f", bytes = 
> {*}"363636663666"{*}}.
> == Full Version ==
> In more detail, this is how it happens
> 1. In 
> [code|https://github.com/apache/cassandra/blob/058621a446d1b128c429bc5a40b67c5158524146/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java#L744],
>  it tries to intern the ColumnIdentifer using the comparator. The comparator 
> is BytesType and column name is "666f6f".
>  
> {code:java}
> ColumnIdentifier.getInterned(comparator.fromString(row.getString("column_name")),
>  comparator);
> {code}
> 2. comparator.fromString(row.getString("column_name") directly returns a 
> ByteBuffer containing {*}"666f6f"{*}. The code below directly assumes that 
> the source is in bytes format.
> {code:java}
> // BytesType.java
> public ByteBuffer fromString(String source)
> {
>     try
>     {
>         return ByteBuffer.wrap(Hex.hexToBytes(source));
>     }
>     catch (NumberFormatException e)
>     {
>         logger.info("running into MarshalException");
>         throw new MarshalException(String.format("cannot parse '%s' as hex 
> bytes", source), e);
>     }
> } {code}
>  
>  
> 3. ColumnIdentifier.getInterned uses the returned ByteBuffer to create a new 
> ColumnIdentifier object.
>  
> {code:java}
> text = "666f6f"
> bytes = "666f6f"{code}
>  
> h1. Fix
> This can be fixed in a simple way. If the comparator type is BytesType, we 
> shouldn't use comparator to get the ByteBuffer. Instead, consider it as 
> String format and directly use ByteBufferUtil.bytes to get the bytes.
> Then the generated column identifier will be \{text = "666f6f", bytes = 
> "363636663666"}
> The patch is here.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-18728) [Transient Bug] Incorrect ByteBuffer representation of ColumnIdentifiers when 3.11.16 loading legacy data from 2.x

Reply via email to