[ 
https://issues.apache.org/jira/browse/CASSANDRA-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045374#comment-15045374
 ] 

Russ Hatch commented on CASSANDRA-10822:
----------------------------------------

This appears to be happening when upgrading through 2.2 on the way to 3.0.

Confirmed happening on 3.0.0 as well as 3.0 head.

As Andy stated, the static column isn't needed to see the issue.

> SSTable data loss when upgrading with row tombstone present
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-10822
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10822
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Andy Tolbert
>            Priority: Critical
>             Fix For: 3.0.x, 3.x
>
>
> I ran into an issue when upgrading between 2.1.11 to 3.0.0 (and also 
> cassandra-3.0 branch) where subsequent rows were lost within a partition 
> where there is a row tombstone present.
> Here's a scenario that reproduces the issue.
> Using ccm create a single node cluster at 2.1.11:
> {{ccm create -n 1 -v 2.1.11 -s financial}}
> Run the following queries to create schema, populate some data and then 
> delete some data for november:
> {noformat}
> drop keyspace if exists financial;
> create keyspace if not exists financial with replication = {'class': 
> 'SimpleStrategy', 'replication_factor' : 1 };
> create table if not exists financial.symbol_history (
>   symbol text,
>   name text static,
>   year int,
>   month int,
>   day int,
>   volume bigint,
>   close double,
>   open double,
>   low double,
>   high double,
>   primary key((symbol, year), month, day)
> ) with CLUSTERING ORDER BY (month desc, day desc);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 1, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 2, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 3, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 4, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 5, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 6, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 7, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 8, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 9, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 10, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 11, 1, 100);
> insert into financial.symbol_history (symbol, name, year, month, day, volume) 
> values ('CORP', 'MegaCorp', 2004, 12, 1, 100);
> delete from financial.symbol_history where symbol='CORP' and year = 2004 and 
> month=11;
> {noformat}
> Flush and run sstable2json on the sole Data.db file:
> {noformat}
> ccm node1 flush
> sstable2json /path/to/file.db
> {noformat}
> The output should look like the following:
> {code}
> [
> {"key": "CORP:2004",
>  "cells": [["::name","MegaCorp",1449457517033030],
>            ["12:1:","",1449457517033030],
>            ["12:1:volume","100",1449457517033030],
>            ["11:_","11:!",1449457564983269,"t",1449457564],
>            ["10:1:","",1449457516313738],
>            ["10:1:volume","100",1449457516313738],
>            ["9:1:","",1449457516310205],
>            ["9:1:volume","100",1449457516310205],
>            ["8:1:","",1449457516235664],
>            ["8:1:volume","100",1449457516235664],
>            ["7:1:","",1449457516233535],
>            ["7:1:volume","100",1449457516233535],
>            ["6:1:","",1449457516231458],
>            ["6:1:volume","100",1449457516231458],
>            ["5:1:","",1449457516228307],
>            ["5:1:volume","100",1449457516228307],
>            ["4:1:","",1449457516225415],
>            ["4:1:volume","100",1449457516225415],
>            ["3:1:","",1449457516222811],
>            ["3:1:volume","100",1449457516222811],
>            ["2:1:","",1449457516220301],
>            ["2:1:volume","100",1449457516220301],
>            ["1:1:","",1449457516210758],
>            ["1:1:volume","100",1449457516210758]]}
> ]
> {code}
> Prepare for upgrade
> {noformat}
> ccm node1 nodetool snapshot financial
> ccm node1 nodetool drain
> ccm node1 stop
> {noformat}
> Upgrade to cassandra-3.0 and start the node
> {noformat}
> ccm node1 setdir -v git:cassandra-3.0
> ccm node1 start
> {noformat}
> Run command in cqlsh and observe only 1 row is returned!  It appears that all 
> data following november is gone.
> {noformat}
> cqlsh> select * from financial.symbol_history;
>  symbol | year | month | day | name     | close | high | low  | open | volume
> --------+------+-------+-----+----------+-------+------+------+------+--------
>    CORP | 2004 |    12 |   1 | MegaCorp |  null | null | null | null |    100
> {noformat}
> Upgrade sstables and query again and you'll observe the same problem.
> {noformat}
> ccm node1 nodetool upgradesstables financial
> {noformat}
> I modified the 2.2 version of sstable2json so that it works with 3.0 
> (couldn't help myself :)), and observed 2 RangeTombstoneBoundMarker 
> occurrences for 1 delete and the rest of the data missing.
> {code}
> [
> {
>  "key": "CORP:2004",
>  "static": {
>   "cells": {
>     ["name","MegaCorp",1449457517033030]
>   }
>  },
>  "rows": [
>   {
>    "clustering": {"month": "12", "day": "1"},
>    "cells": {
>      ["volume","100",1449457517033030]
>    }
>   },
>   {
>    "tombstone": ["11:*",1449457564983269,"t",1449457564]
>   },
>   {
>    "tombstone": ["11:*",1449457564983269,"t",1449457564]
>   }
>  ]
> }
> ]
> {code}
> I'm not sure why this is happening, but I should point out that I'm using 
> static columns here and that I'm using reverse order for my clustering, so 
> maybe that makes a difference.  I'll try without static columns / regular 
> ordering to see if that makes a difference and update the ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to