[ 
https://issues.apache.org/jira/browse/CASSANDRA-12423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420740#comment-15420740
 ] 

Stefania commented on CASSANDRA-12423:
--------------------------------------

Thank you for the details. I've created 2 new tests 
[here|https://github.com/stef1927/cassandra-dtest/commits/12423], one for 
upgrading sstables from 2.1.9 to 3.0+ and one that just creates a range 
tombstone with EOC=0. I've reproduced the problem in both cases.

The patch is in {{LegacyLayout}} as indicated above:

||3.0||3.9||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/12423-3.0]|[patch|https://github.com/stef1927/cassandra/commits/12423-3.9]|[patch|https://github.com/stef1927/cassandra/commits/12423]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-3.9-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-3.9-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12423-dtest/]|

Aside from this issue, there is then the problem that the clustering column 
names "c1" and "c2" are converted into their hex bytes "6331" and "6332". I've 
traced this to this code in {{LegacySchemaMigrator}}:

{code}
        // Note: we save the column name as string, but we should not assume 
that it is an UTF8 name, we
        // we need to use the comparator fromString method
        AbstractType<?> comparator = isCQLTable
                                     ? UTF8Type.instance
                                     : 
CompactTables.columnDefinitionComparator(rawKind, isSuper, rawComparator, 
rawSubComparator);
        ColumnIdentifier name = 
ColumnIdentifier.getInterned(comparator.fromString(row.getString("column_name")),
 comparator);
{code}

The problem is that there is another table, {{Standard3}} created by 
thrift_test.py, and in this case we create an interned column identifier with 
comparator set to {{BytesType}}, which means the column name ends up being the 
hex bytes corresponding to "c1": "6331", and similar for "c2". This by itself 
seems wrong but maybe it is a known thrift limitation. Then, when we create the 
identifier for our test table, even though the comparator here is correctly set 
to UTF8, {{ColumnIdentifier.internedInstances}} ensures that we pick up the 
same incorrect value because the UTF-8 encoding of "c1" or "c2" has the same 
bytes. [~iamaleksey] is this something you are aware of or does it need fixing? 
And if so, how can we ever rely on {{ColumnIdentifier.internedInstances}} if we 
use non UTF8 comparators? My guess is that 
{{CompactTables.columnDefinitionComparator}} is returning the wrong comparator 
for {{Standard3}} but I'm not 100% sure.

I've attached the sstables generated with 2.1, [^12423.tar.gz],  if you want to 
reproduce it quickly, just start 3.0 with these sstables after setting the 
partitioner and cluster name to {{ByteOrderedPartitioner}} and {{test}} 
respectively.



> Cells missing from compact storage table after upgrading from 2.1.9 to 3.7
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12423
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12423
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Tomasz Grabiec
>            Assignee: Stefania
>         Attachments: 12423.tar.gz
>
>
> Schema:
> {code}
> create table ks1.test ( id int, c1 text, c2 text, v int, primary key (id, c1, 
> c2)) with compact storage and compression = {'sstable_compression': ''};
> {code}
> sstable2json before upgrading:
> {code}
> [
> {"key": "1",
>  "cells": [["","0",1470761440040513],
>            ["a","asd",2470761440040513,"t",1470764842],
>            ["asd:","0",1470761451368658],
>            ["asd:asd","0",1470761449416613]]}
> ]
> {code}
> Query result with 2.1.9:
> {code}
> cqlsh> select * from ks1.test;
>  id | c1  | c2   | v
> ----+-----+------+---
>   1 |     | null | 0
>   1 | asd |      | 0
>   1 | asd |  asd | 0
> (3 rows)
> {code}
> Query result with 3.7:
> {code}
> cqlsh> select * from ks1.test;
>  id | 6331 | 6332 | v
> ----+------+------+---
>   1 |      | null | 0
> (1 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to