[
https://issues.apache.org/jira/browse/CASSANDRA-12423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420740#comment-15420740
]
Stefania commented on CASSANDRA-12423:
--------------------------------------
Thank you for the details. I've created 2 new tests
[here|https://github.com/stef1927/cassandra-dtest/commits/12423], one for
upgrading sstables from 2.1.9 to 3.0+ and one that just creates a range
tombstone with EOC=0. I've reproduced the problem in both cases.
The patch is in {{LegacyLayout}} as indicated above:
||3.0||3.9||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/12423-3.0]|[patch|https://github.com/stef1927/cassandra/commits/12423-3.9]|[patch|https://github.com/stef1927/cassandra/commits/12423]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-3.9-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-3.9-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-dtest/]|
Aside from this issue, there is then the problem that the clustering column
names "c1" and "c2" are converted into their hex bytes "6331" and "6332". I've
traced this to this code in {{LegacySchemaMigrator}}:
{code}
// Note: we save the column name as string, but we should not assume
that it is an UTF8 name, we
// we need to use the comparator fromString method
AbstractType<?> comparator = isCQLTable
? UTF8Type.instance
:
CompactTables.columnDefinitionComparator(rawKind, isSuper, rawComparator,
rawSubComparator);
ColumnIdentifier name =
ColumnIdentifier.getInterned(comparator.fromString(row.getString("column_name")),
comparator);
{code}
The problem is that there is another table, {{Standard3}} created by
thrift_test.py, and in this case we create an interned column identifier with
comparator set to {{BytesType}}, which means the column name ends up being the
hex bytes corresponding to "c1": "6331", and similar for "c2". This by itself
seems wrong but maybe it is a known thrift limitation. Then, when we create the
identifier for our test table, even though the comparator here is correctly set
to UTF8, {{ColumnIdentifier.internedInstances}} ensures that we pick up the
same incorrect value because the UTF-8 encoding of "c1" or "c2" has the same
bytes. [~iamaleksey] is this something you are aware of or does it need fixing?
And if so, how can we ever rely on {{ColumnIdentifier.internedInstances}} if we
use non UTF8 comparators? My guess is that
{{CompactTables.columnDefinitionComparator}} is returning the wrong comparator
for {{Standard3}} but I'm not 100% sure.
I've attached the sstables generated with 2.1, [^12423.tar.gz], if you want to
reproduce it quickly, just start 3.0 with these sstables after setting the
partitioner and cluster name to {{ByteOrderedPartitioner}} and {{test}}
respectively.
> Cells missing from compact storage table after upgrading from 2.1.9 to 3.7
> --------------------------------------------------------------------------
>
> Key: CASSANDRA-12423
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12423
> Project: Cassandra
> Issue Type: Bug
> Reporter: Tomasz Grabiec
> Assignee: Stefania
> Attachments: 12423.tar.gz
>
>
> Schema:
> {code}
> create table ks1.test ( id int, c1 text, c2 text, v int, primary key (id, c1,
> c2)) with compact storage and compression = {'sstable_compression': ''};
> {code}
> sstable2json before upgrading:
> {code}
> [
> {"key": "1",
> "cells": [["","0",1470761440040513],
> ["a","asd",2470761440040513,"t",1470764842],
> ["asd:","0",1470761451368658],
> ["asd:asd","0",1470761449416613]]}
> ]
> {code}
> Query result with 2.1.9:
> {code}
> cqlsh> select * from ks1.test;
> id | c1 | c2 | v
> ----+-----+------+---
> 1 | | null | 0
> 1 | asd | | 0
> 1 | asd | asd | 0
> (3 rows)
> {code}
> Query result with 3.7:
> {code}
> cqlsh> select * from ks1.test;
> id | 6331 | 6332 | v
> ----+------+------+---
> 1 | | null | 0
> (1 rows)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)