[ 
https://issues.apache.org/jira/browse/CASSANDRA-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829145#comment-15829145
 ] 

Yasuharu Goto commented on CASSANDRA-13125:
-------------------------------------------

h2. Investigations...

After some debugging, I found interesting difference in serialized 
RangeTombstoneLists between 2.1.16 and 3.0.10.

- I ran 3 Cassandra nodes with some debug prints.
-- 127.0.0.1 (C* 3.0.10)
-- 127.0.0.2 (C* 2.1.16)
-- 127.0.0.3 (C* 2.1.16)
- They have a keyspace and a table already created.
-- CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}
-- CREATE TABLE test.test ( a int PRIMARY KEY, b int, c set<int>, d set<int>, e 
int )
- And I query a same INSERT (which mutation is sent to 127.0.0.2) query from 
127.0.0.1(C*3.0) and 127.0.0.3(C*2.1) and see the difference.

Insert a row from 127.0.0.1 and scan. ( inserted (a=14) row is broken)
{code:sql}
cqlsh> insert into test.test(a,b,c,d,e) values(14,1,{2,3},{4,5},6);
cqlsh> select * from test.test;

 a  | b    | c      | d      | e
----+------+--------+--------+------
 14 |    1 |   null |   null | null
 14 | null | {2, 3} | {4, 5} |    6

(2 rows)
{code}

And then, I insert from 127.0.0.3 and scan.  (neither a=5 nor a=14 are broken)
{code:sql}
cqlsh> insert into test.test(a,b,c,d,e)values(5,1,{2,3},{4,5},6);
cqlsh> select * from test.test;

 a  | b | c      | d      | e
----+---+--------+--------+---
  5 | 1 | {2, 3} | {4, 5} | 6
 14 | 1 | {2, 3} | {4, 5} | 6
{code}

And back to 127.0.0.1 and scan the table. a=14 is broken but a=5 is not.
{code:sql}
cqlsh> select * from test.test;

 a  | b    | c      | d      | e
----+------+--------+--------+------
  5 |    1 | {2, 3} | {4, 5} |    6
 14 |    1 |   null |   null | null
 14 | null | {2, 3} | {4, 5} |    6
{code}

Therefore,It looks like that "C*3 can't scan properly rows that is stored in 
C*2 but inserted from C*3.";

Next, I observed some incoming MUTATIONs in 127.0.0.2 like below. I saw that 
C*3.0 sent RangeTombstones like {{[c-c],[c-d]}}, but C*2.1 sent 
{{[c:_-c],[d:_-d]}}.

{noformat}
> insert into test.test(a,b,c,d,e) values(14,1,{2,3},{4,5},6); from 127.0.0.1

DeletionInfo:{deletedAt=-9223372036854775808, localDeletion=2147483647, 
ranges=[c-c:!, deletedAt=1484710273390930, localDeletion=1484710273][c-d:!, 
deletedAt=1484710273390930, localDeletion=1484710273]}
from:/127.0.0.1, payload:Mutation(keyspace='test', key='0000000e', 
modifications=[ColumnFamily(test -{deletedAt=-9223372036854775808, 
localDeletion=2147483647, ranges=[c-c:!, deletedAt=1484710273390930, 
localDeletion=1484710273][c-d:!, deletedAt=1484710273390930, 
localDeletion=1484710273]}- 
[:false:0@1484710273390931,b:false:4@1484710273390931,c:00000002:false:0@1484710273390931,c:00000003:false:0@1484710273390931,d:00000004:false:0@1484710273390931,d:00000005:false:0@1484710273390931,e:false:4@1484710273390931,])]),
 verb:MUTATION, version:8

> insert into test.test(a,b,c,d,e) values(14,1,{2,3},{4,5},6); from 127.0.0.3
DeletionInfo:{deletedAt=-9223372036854775808, localDeletion=2147483647, 
ranges=[c:_-c:!, deletedAt=1484710277987556, localDeletion=1484710277][d:_-d:!, 
deletedAt=1484710277987556, localDeletion=1484710277]}
from:/127.0.0.3, payload:Mutation(keyspace='test', key='0000000e', 
modifications=[ColumnFamily(test -{deletedAt=-9223372036854775808, 
localDeletion=2147483647, ranges=[c:_-c:!, deletedAt=1484710277987556, 
localDeletion=1484710277][d:_-d:!, deletedAt=1484710277987556, 
localDeletion=1484710277]}- 
[:false:0@1484710277987557,b:false:4@1484710277987557,c:00000002:false:0@1484710277987557,c:00000003:false:0@1484710277987557,d:00000004:false:0@1484710277987557,d:00000005:false:0@1484710277987557,e:false:4@1484710277987557,])]),
 verb:MUTATION, version:8
{noformat}

h2. Workaround Plan-A

But, LegacyRangeTombstone remove {{collectionName}} from RangeTombStone which 
start.bound != end.bound like {{[c-d]}}
https://github.com/apache/cassandra/blob/cassandra-3.0.10/src/java/org/apache/cassandra/db/LegacyLayout.java#L1592-L1599
It seems like that this deletions of collectionName corrupt the unmarshal of 
legacy tombstone. After I commentized these else-if block, I could scan the 
table correctly.


{code:java}
            if ((start.collectionName == null) != (stop.collectionName == null))
            {
                if (start.collectionName == null)
                    stop = new LegacyBound(stop.bound, stop.isStatic, null);
                else
                    start = new LegacyBound(start.bound, start.isStatic, null);
            }
            /*else if (!Objects.equals(start.collectionName, 
stop.collectionName))
            {
                // We're in the similar but slightly more complex case where on 
top of the big tombstone
                // A, we have 2 (or more) collection tombstones B and C within 
A. So we also end up with
                // a tombstone that goes between the end of B and the start of 
C.
                start = new LegacyBound(start.bound, start.isStatic, null);
                stop = new LegacyBound(stop.bound, stop.isStatic, null);
            }
            */
{code}

{noformat}
cqlsh> select * from test.test;

 a  | b | c      | d      | e
----+---+--------+--------+---
  5 | 1 | {2, 3} | {4, 5} | 6
 14 | 1 | {2, 3} | {4, 5} | 6
{noformat}

h2. Workaround Plan-B
Instead of modify the LegacyLayout unmarshal code, commentizing the following 
line fixed the problem too. It changes the TombStoneRange which is serialized  
by LegacyLayout from {{[c-c][c-d]}} to {{[c-c][d-d]}}.

https://github.com/apache/cassandra/blob/cassandra-3.0.10/src/java/org/apache/cassandra/db/LegacyLayout.java#L2099
{code:java}
//                     start = ends[i];
{code}


{noformat}

DeletionInfo:{deletedAt=-9223372036854775808, localDeletion=2147483647, 
ranges=[c-c:!, deletedAt=1484715120458008, localDeletion=1484715120][d-d:!, 
deletedAt=1484715120458008, localDeletion=1484715120]}
from:/127.0.0.1, payload:Mutation(keyspace='test', key='0000000e', 
modifications=[ColumnFamily(test -{deletedAt=-9223372036854775808, 
localDeletion=2147483647, ranges=[c-c:!, deletedAt=1484715120458008, 
localDeletion=1484715120][d-d:!, deletedAt=1484715120458008, 
localDeletion=1484715120]}- 
[:false:0@1484715120458009,b:false:4@1484715120458009,c:00000002:false:0@1484715120458009,c:00000003:false:0@1484715120458009,d:00000004:false:0@1484715120458009,d:00000005:false:0@1484715120458009,e:false:4@1484715120458009,])]),
 verb:MUTATION, version:8
{noformat}

I'm not sure if my solution cause any unexpected effects. But I attach my 
patches for reference.
Could anyboody please review my patch?


> Duplicate rows after upgrading from 2.1.16 to 3.0.10/3.9
> --------------------------------------------------------
>
>                 Key: CASSANDRA-13125
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13125
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Zhongxiang Zheng
>
> I found that rows are splitting and duplicated after upgrading the cluster 
> from 2.1.x to 3.0.x.
> I found the way to reproduce the problem as below.
> {code}
> $ ccm create test -v 2.1.16 -n 3 -s                                           
>                                     
> Current cluster is now: test
> $ ccm node1 cqlsh  -e "CREATE KEYSPACE test WITH replication = 
> {'class':'SimpleStrategy', 'replication_factor':3}"
> $ ccm node1 cqlsh -e "CREATE TABLE test.test (id text PRIMARY KEY, value1 
> set<text>, value2 set<text>);"
> # Upgrade node1
> $ for i in 1; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm 
> node${i} start;ccm node${i} nodetool upgradesstables; done
> # Insert a row through node1(3.0.10)
> $ ccm node1 cqlsh -e "INSERT INTO test.test (id, value1, value2) values 
> ('aaa', {'aaa', 'bbb'}, {'ccc', 'ddd'});"                       
> # Insert a row through node2(2.1.16)
> $ ccm node2 cqlsh -e "INSERT INTO test.test (id, value1, value2) values 
> ('bbb', {'aaa', 'bbb'}, {'ccc', 'ddd'});" 
> # The row inserted from node1 is splitting
> $ ccm node1 cqlsh -e "SELECT * FROM test.test ;"
>  id  | value1         | value2
> -----+----------------+----------------
>  aaa |           null |           null
>  aaa | {'aaa', 'bbb'} | {'ccc', 'ddd'}
>  bbb | {'aaa', 'bbb'} | {'ccc', 'ddd'}
> $ for i in 1 2; do ccm node${i} nodetool flush; done
> # Results of sstable2json of node2. The row inserted from node1(3.0.10) is 
> different from the row inserted from node2(2.1.16).
> $ ccm node2 json -k test -c test
> running
> ['/home/zzheng/.ccm/test/node2/data0/test/test-5406ee80dbdb11e6a175f57c4c7c85f3/test-test-ka-1-Data.db']
> -- test-test-ka-1-Data.db -----
> [
> {"key": "aaa",
>  "cells": [["","",1484564624769577],
>            ["value1","value2:!",1484564624769576,"t",1484564624],
>            ["value1:616161","",1484564624769577],
>            ["value1:626262","",1484564624769577],
>            ["value2:636363","",1484564624769577],
>            ["value2:646464","",1484564624769577]]},
> {"key": "bbb",
>  "cells": [["","",1484564634508029],
>            ["value1:_","value1:!",1484564634508028,"t",1484564634],
>            ["value1:616161","",1484564634508029],
>            ["value1:626262","",1484564634508029],
>            ["value2:_","value2:!",1484564634508028,"t",1484564634],
>            ["value2:636363","",1484564634508029],
>            ["value2:646464","",1484564634508029]]}
> ]
> # Upgrade node2,3
> $ for i in `seq 2 3`; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm 
> node${i} start;ccm node${i} nodetool upgradesstables; done
> # After upgrade node2,3, the row inserted from node1 is splitting in node2,3
> $ ccm node2 cqlsh -e "SELECT * FROM test.test ;"                              
>                                                           
>  id  | value1         | value2
> -----+----------------+----------------
>  aaa |           null |           null
>  aaa | {'aaa', 'bbb'} | {'ccc', 'ddd'}
>  bbb | {'aaa', 'bbb'} | {'ccc', 'ddd'}
> (3 rows)
> # Results of sstabledump
> # node1
> [
>   {
>     "partition" : {
>       "key" : [ "aaa" ],
>       "position" : 0
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 17,
>         "liveness_info" : { "tstamp" : "2017-01-16T11:03:44.769577Z" },
>         "cells" : [
>           { "name" : "value1", "deletion_info" : { "marked_deleted" : 
> "2017-01-16T11:03:44.769576Z", "local_delete_time" : "2017-01-16T11:03:44Z" } 
> },
>           { "name" : "value1", "path" : [ "aaa" ], "value" : "" },
>           { "name" : "value1", "path" : [ "bbb" ], "value" : "" },
>           { "name" : "value2", "deletion_info" : { "marked_deleted" : 
> "2017-01-16T11:03:44.769576Z", "local_delete_time" : "2017-01-16T11:03:44Z" } 
> },
>           { "name" : "value2", "path" : [ "ccc" ], "value" : "" },
>           { "name" : "value2", "path" : [ "ddd" ], "value" : "" }
>         ]
>       }
>     ]
>   },
>   {
>     "partition" : {
>       "key" : [ "bbb" ],
>       "position" : 48
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 65,
>         "liveness_info" : { "tstamp" : "2017-01-16T11:03:54.508029Z" },
>         "cells" : [
>           { "name" : "value1", "deletion_info" : { "marked_deleted" : 
> "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } 
> },
>           { "name" : "value1", "path" : [ "aaa" ], "value" : "" },
>           { "name" : "value1", "path" : [ "bbb" ], "value" : "" },
>           { "name" : "value2", "deletion_info" : { "marked_deleted" : 
> "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } 
> },
>           { "name" : "value2", "path" : [ "ccc" ], "value" : "" },
>           { "name" : "value2", "path" : [ "ddd" ], "value" : "" }
>         ]
>       }
>     ]
>   }
> ]                                                                             
>                                                                        
> # node2
> [
>   {
>     "partition" : {
>       "key" : [ "aaa" ],
>       "position" : 0
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 17,
>         "liveness_info" : { "tstamp" : "2017-01-16T11:03:44.769577Z" },
>         "cells" : [ ]
>       },
>       {
>         "type" : "row",
>         "position" : 22,
>         "deletion_info" : { "marked_deleted" : "2017-01-16T11:03:44.769576Z", 
> "local_delete_time" : "2017-01-16T11:03:44Z" },
>         "cells" : [
>           { "name" : "value1", "path" : [ "aaa" ], "value" : "", "tstamp" : 
> "2017-01-16T11:03:44.769577Z" },
>           { "name" : "value1", "path" : [ "bbb" ], "value" : "", "tstamp" : 
> "2017-01-16T11:03:44.769577Z" },
>           { "name" : "value2", "path" : [ "ccc" ], "value" : "", "tstamp" : 
> "2017-01-16T11:03:44.769577Z" },
>           { "name" : "value2", "path" : [ "ddd" ], "value" : "", "tstamp" : 
> "2017-01-16T11:03:44.769577Z" }
>         ]
>       }
>     ]
>   },
>   {
>     "partition" : {
>       "key" : [ "bbb" ],
>       "position" : 57
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 74,
>         "liveness_info" : { "tstamp" : "2017-01-16T11:03:54.508029Z" },
>         "cells" : [
>           { "name" : "value1", "deletion_info" : { "marked_deleted" : 
> "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } 
> },
>           { "name" : "value1", "path" : [ "aaa" ], "value" : "" },
>           { "name" : "value1", "path" : [ "bbb" ], "value" : "" },
>           { "name" : "value2", "deletion_info" : { "marked_deleted" : 
> "2017-01-16T11:03:54.508028Z", "local_delete_time" : "2017-01-16T11:03:54Z" } 
> },
>           { "name" : "value2", "path" : [ "ccc" ], "value" : "" },
>           { "name" : "value2", "path" : [ "ddd" ], "value" : "" }
>         ]
>       }
>     ]
>   }
> ]
> {code}
> Another example of row splitting is as follows.
> {code}
> $ ccm create test2 -v 2.1.16 -n 3 -s                                          
>                                                           
> Current cluster is now: test2
> $ ccm node1 cqlsh  -e "CREATE KEYSPACE test WITH replication = 
> {'class':'SimpleStrategy', 'replication_factor':3}"                      
> $ ccm node1 cqlsh -e "CREATE TABLE test.text_set_set (id text PRIMARY KEY, 
> value1 text, value2 set<text>, value3 set<text>);"           
> $ for i in `seq 1`; do ccm node${i} stop; ccm node${i} setdir -v3.0.10; ccm 
> node${i} start;ccm node${i} nodetool upgradesstables; done  
> $ ccm node1 cqlsh -e "INSERT INTO test.text_set_set (id, value1, value2, 
> value3) values ('aaa', 'aaa', {'aaa', 'bbb'}, {'ccc', 'ddd'});"
> $ ccm node1 cqlsh -e "SELECT * FROM test.text_set_set;"                       
>                                                           
>  id  | value1 | value2         | value3
> -----+--------+----------------+----------------
>  aaa |    aaa |           null |           null
>  aaa |   null | {'aaa', 'bbb'} | {'ccc', 'ddd'}
> (2 rows)
> {code}
> As far as I investigated, the occurrence conditions are as follows.
> * Table schema contains multiple collections.
> * Insert a row, which values of the collection column are not null through 
> 3.x node while both 2.1 and 3.x nodes exist in a cluster.
> * Rows in sstables of node which version was 2.1 at the time the row was 
> inserted is splitting after upgrading to 3.x.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to