[ 
https://issues.apache.org/jira/browse/CASSANDRA-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572637#comment-15572637
 ] 

Tim Kieschnick edited comment on CASSANDRA-11887 at 10/13/16 8:01 PM:
----------------------------------------------------------------------

This issue is still occurring.  

We migrated in place directly from 2.2.5 to 3.0.9 and did an upgradesstables on 
each of the 6 storage nodes.  We have the same duplicate row problem.  This has 
occurred in multiple tables during our migration and each of those tables used 
a map collection type.  The original issue was specifically related to UDTs but 
we specifically see it with map types.  

The simplest scenario is below.

{noformat}
cqlsh:*****> consistency all;
Consistency level set to ALL.
cqlsh:*****> desc table *****.customers;
CREATE TABLE *****.customers (
    customer_id uuid PRIMARY KEY,
    ....
    order_by decimal,
    udf_values map<text, text>
) WITH ... 
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'enabled': 'false'}
    AND gc_grace_seconds = 2592000
    ...;
cqlsh:*****> select customer_id, order_by, last_modified, udf_values from 
customers where customer_id=4f7c602e-9022-431f-a949-f4382988c862;               
             

 customer_id                          | order_by | last_modified            | 
udf_values
--------------------------------------+----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------
 4f7c602e-9022-431f-a949-f4382988c862 |     null | 2016-10-13 12:51:36+0000 | 
{'STR_LATEST_RECORD_DATE': '2016-10-08T07:00:00.000Z', 'STR_LOAD_DATE': 
'2016-10-12T20:21:30.186Z', 'TCLK_LOAD_DATE': '2016-10-13T12:51:34.944Z'}
 4f7c602e-9022-431f-a949-f4382988c862 |     null |                     null | 
{'STR_LATEST_RECORD_DATE': '2016-09-24T07:00:00.000Z', 'STR_LOAD_DATE': 
'2016-09-28T20:25:16.900Z', 'TCLK_LOAD_DATE': '2016-10-03T12:51:56.407Z'}

(2 rows)
cqlsh:abvprp> update customers set order_by=1, last_modified=toTimestamp(now()) 
where customer_id=4f7c602e-9022-431f-a949-f4382988c862;                         
           
cqlsh:abvprp> select customer_id, order_by, last_modified, udf_values from 
customers where customer_id=4f7c602e-9022-431f-a949-f4382988c862;

 customer_id                          | order_by | last_modified            | 
udf_values
--------------------------------------+----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------
 4f7c602e-9022-431f-a949-f4382988c862 |        1 | 2016-10-13 17:20:32+0000 | 
{'STR_LATEST_RECORD_DATE': '2016-10-08T07:00:00.000Z', 'STR_LOAD_DATE': 
'2016-10-12T20:21:30.186Z', 'TCLK_LOAD_DATE': '2016-10-13T12:51:34.944Z'}
 4f7c602e-9022-431f-a949-f4382988c862 |     null |                     null | 
{'STR_LATEST_RECORD_DATE': '2016-09-24T07:00:00.000Z', 'STR_LOAD_DATE': 
'2016-09-28T20:25:16.900Z', 'TCLK_LOAD_DATE': '2016-10-03T12:51:56.407Z'}

(2 rows)
{noformat}

We also have a table with UDTs and did not detect any duplicate rows.

Can you please reopen this issue and investigate the issues with maps creating 
duplicate rows with the same primary key?  

Thank you.


was (Author: timkieschnick):
This issue is still occurring.  

We migrated in place directly from 2.2.5 to 3.0.9 and did an upgradesstables on 
each of the 6 storage nodes.  We have the same duplicate row problem.  This has 
occurred in multiple tables during our migration and each of those tables used 
a map collection type.  The original issue was specifically related to UDTs but 
we specifically see it with map types.  

The simplest scenario is below.

{noformat}
cqlsh:*****> consistency all;
Consistency level set to ALL.
cqlsh:*****> desc table *****.customers;
CREATE TABLE *****.customers (
    customer_id uuid PRIMARY KEY,
    ....
    order_by decimal,
    udf_values map<text, text>
) WITH ... 
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'enabled': 'false'}
    AND gc_grace_seconds = 2592000
    ...;
cqlsh:*****> select customer_id, order_by, last_modified, udf_values from 
customers where customer_id=4f7c602e-9022-431f-a949-f4382988c862;               
             

 customer_id                          | order_by | last_modified            | 
udf_values
--------------------------------------+----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------
 4f7c602e-9022-431f-a949-f4382988c862 |     null | 2016-10-13 12:51:36+0000 | 
{'STR_LATEST_RECORD_DATE': '2016-10-08T07:00:00.000Z', 'STR_LOAD_DATE': 
'2016-10-12T20:21:30.186Z', 'TCLK_LOAD_DATE': '2016-10-13T12:51:34.944Z'}
 4f7c602e-9022-431f-a949-f4382988c862 |     null |                     null | 
{'STR_LATEST_RECORD_DATE': '2016-09-24T07:00:00.000Z', 'STR_LOAD_DATE': 
'2016-09-28T20:25:16.900Z', 'TCLK_LOAD_DATE': '2016-10-03T12:51:56.407Z'}

(2 rows)
cqlsh:abvprp> update customers set order_by=1, last_modified=toTimestamp(now()) 
where customer_id=4f7c602e-9022-431f-a949-f4382988c862;                         
           
cqlsh:abvprp> select customer_id, order_by, last_modified, udf_values from 
customers where customer_id=4f7c602e-9022-431f-a949-f4382988c862;

 customer_id                          | order_by | last_modified            | 
udf_values
--------------------------------------+----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------
 4f7c602e-9022-431f-a949-f4382988c862 |        1 | 2016-10-13 17:20:32+0000 | 
{'STR_LATEST_RECORD_DATE': '2016-10-08T07:00:00.000Z', 'STR_LOAD_DATE': 
'2016-10-12T20:21:30.186Z', 'TCLK_LOAD_DATE': '2016-10-13T12:51:34.944Z'}
 4f7c602e-9022-431f-a949-f4382988c862 |     null |                     null | 
{'STR_LATEST_RECORD_DATE': '2016-09-24T07:00:00.000Z', 'STR_LOAD_DATE': 
'2016-09-28T20:25:16.900Z', 'TCLK_LOAD_DATE': '2016-10-03T12:51:56.407Z'}

(2 rows)
{noformat}

I can provide SSTable dumps and full schema in efforts to reproduce. We also 
have a table with UDTs and did not detect any duplicate rows.

Can you please reopen this issue and investigate the issues with maps creating 
duplicate rows with the same primary key?  

Thank you.

> Duplicate rows after a 2.2.5 to 3.0.4 migration
> -----------------------------------------------
>
>                 Key: CASSANDRA-11887
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11887
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Julien Anguenot
>            Priority: Blocker
>
> After migrating from 2.2.5 to 3.0.4, some tables seem to carry duplicate 
> primary keys.
> Below an example. Note, repair / scrub of such table do not seem to fix nor 
> indicate any issues.
> *Table definition*:
> {code}
> CREATE TABLE core.edge_ipsec_vpn_service (
>     edge_uuid text PRIMARY KEY,
>     enabled boolean,
>     endpoints set<frozen<edge_ipsec_vpn_endpoint>>,
>     tunnels set<frozen<edge_ipsec_vpn_tunnel>>
> ) WITH bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> {code}
> *UDTs:*
> {code}
> CREATE TYPE core.edge_ipsec_vpn_endpoint (
>     network text,
>     public_ip text
> );
> CREATE TYPE core.edge_ipsec_vpn_tunnel (
>     name text,
>     description text,
>     peer_ip_address text,
>     peer_id text,
>     local_ip_address text,
>     local_id text,
>     local_subnets frozen<set<frozen<edge_ipsec_vpn_subnet>>>,
>     peer_subnets frozen<set<frozen<edge_ipsec_vpn_subnet>>>,
>     shared_secret text,
>     shared_secret_encrypted boolean,
>     encryption_protocol text,
>     mtu int,
>     enabled boolean,
>     operational boolean,
>     error_details text,
>     vpn_peer frozen<edge_ipsec_vpn_peer>
> );
> CREATE TYPE core.edge_ipsec_vpn_subnet (
>     name text,
>     gateway text,
>     netmask text
> );
> CREATE TYPE core.edge_ipsec_vpn_peer (
>     type text,
>     id text,
>     name text,
>     vcd_url text,
>     vcd_org text,
>     vcd_username text
> );
> {code}
> sstabledump extract (IP addressees hidden as well as  secrets)
> {code}
> [...]
>  {
>     "partition" : {
>       "key" : [ "84d567cc-0165-4e64-ab97-3a9d06370ba9" ],
>       "position" : 131146
>     },
>     "rows" : [
>       {
>         "type" : "row",
>         "position" : 131236,
>         "liveness_info" : { "tstamp" : "2016-05-06T17:07:15.416003Z" },
>         "cells" : [
>           { "name" : "enabled", "value" : "true" },
>           { "name" : "tunnels", "path" : [ 
> “XXX::1.2.3.4:1.2.3.4:1.2.3.4:1.2.3.4:XXX:XXX:false:AES256:1500:true:false::third
>  party\\:1.2.3.4\\:\\:\\:\\:” ], "value" : "" }
>         ]
>       },
>       {
>         "type" : "row",
>         "position" : 131597,
>         "cells" : [
>           { "name" : "endpoints", "path" : [ “XXX” ], "value" : "", "tstamp" 
> : "2016-03-29T08:05:38.297015Z" },
>           { "name" : "tunnels", "path" : [ 
> “XXX::1.2.3.4:1.2.3.4:1.2.3.4:1.2.3.4:XXX:XXX:false:AES256:1500:true:true::third
>  party\\:1.2.3.4\\:\\:\\:\\:” ], "value" : "", "tstamp" : 
> "2016-03-29T08:05:38.297015Z" },
>           { "name" : "tunnels", "path" : [ 
> “XXX::1.2.3.4:1.2.3.4:1.2.3.4:1.2.3.4:XXX:XXX:false:AES256:1500:true:false::third
>  party\\:1.2.3.4\\:\\:\\:\\:" ], "value" : "", "tstamp" : 
> "2016-03-14T18:05:07.262001Z" },
>           { "name" : "tunnels", "path" : [ 
> “XXX::1.2.3.4:1.2.3.4:1.2.3.4:1.2.3.4XXX:XXX:false:AES256:1500:true:true::third
>  party\\:1.2.3.4\\:\\:\\:\\:" ], "value" : "", "tstamp" : 
> "2016-03-29T08:05:38.297015Z" }
>         ]
>       },
>       {
>         "type" : "row",
>         "position" : 133644,
>         "cells" : [
>           { "name" : "tunnels", "path" : [ 
> “XXX::1.2.3.4:1.2.3.4:1.2.3.4:1.2.3.4:XXX:XXX:false:AES256:1500:true:true::third
>  party\\:1.2.3.4\\:\\:\\:\\:" ], "value" : "", "tstamp" : 
> "2016-03-29T07:05:27.213013Z" },
>           { "name" : "tunnels", "path" : [ 
> “XXX::1.2.3.4.7:1.2.3.4:1.2.3.4:1.2.3.4:XXX:XXX:false:AES256:1500:true:true::third
>  party\\:1.2.3.4\\:\\:\\:\\:" ], "value" : "", "tstamp" : 
> "2016-03-29T07:05:27.213013Z" }
>         ]
>       }
>     ]
>   },
> [...]
> [...]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to