[ 
https://issues.apache.org/jira/browse/CASSANDRA-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509286#comment-14509286
 ] 

Nick Bailey commented on CASSANDRA-9195:
----------------------------------------

bq. Because commitlog is acting correctly, it should not replay truncated data 
regardless of where it comes from. In this particular case sstable loader is 
leaving the table in a bad state; rather than try to live with it I prefer to 
correct it at the origin.

bq. The PITR process includes a truncation step. What I am doing here is giving 
PITR the option to correctly continue after that step.

I don't think this is right. The attached reproduction step does include a 
truncation step but that step is just there to clear the database before 
verifying that restore works. It could be replaced with drop and recreate (and 
then you wouldn't see this bug). But in any case, users may end up truncating 
and then regretting it and trying to restore from commitlogs.

I don't think sstableloader is the right place for this because restoring from 
a snapshot is not required for point in time restore. You can simply archive 
every commitlog from the start rather than ever taking snapshots if you want. 
Then if you go and replay those commitlogs up to some time before the 
truncation C* should recognize that the replay is strictly before any 
truncation took place and let things replay.

Also, I still think it seems like truncation records should exist in the 
commitlogs themselves, or at the very least be a historical list of truncations.

> commitlog replay only actually replays mutation every other time
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-9195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9195
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jon Moses
>            Assignee: Branimir Lambov
>            Priority: Critical
>             Fix For: 2.1.5
>
>         Attachments: 9195-v2.1.patch, loader.py
>
>
> Version: Cassandra 2.1.4.374 | DSE 4.7.0
> The main issue here is that the restore-cycle only replays the mutations
> every other try.  On the first try, it will restore the snapshot as expected
> and the cassandra system load will show that it's reading the mutations, but
> they do not actually get replayed, and at the end you're left with only the
> snapshot data (2k records).
> If you re-run the restore-cycle again, the commitlogs are replayed as 
> expected,
> and the data expected is present in the table (4k records, with a spot check 
> of 
> record 4500, as it's in the commitlog but not the snapshot).
> Then if you run the cycle again, it will fail.  Then again, and it will work. 
> The work/
> not work pattern continues.  Even re-running the commitlog replay a 2nd time, 
> without
> reloading the snapshot doesn't work
> The load process is:
> * Modify commitlog segment to 1mb
> * Archive to directory
> * create keyspace/table
> * insert base data
> * initial snapshot
> * write more data
> * capture timestamp
> * write more data
> * final snapshot
> * copy commitlogs to 2nd location
> * modify cassandra-env to replay only specified keyspace
> * modify commitlog properties to restore from 2nd location, with noted 
> timestamp
> The restore cycle is:
> * truncate table
> * sstableload snapshot
> * flush
> * output data status
> * restart to replay commitlogs
> * output data status
> ====
> See attached .py for a mostly automated reproduction scenario.  It expects 
> DSE (and I found it with DSE 4.7.0-1), rather than "actual" Cassandra, but 
> it's not using any DSE specific features.  The script looks for the configs 
> in the DSE locations, but they're set at the top, and there's only 2 places 
> where dse is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to