[ 
https://issues.apache.org/jira/browse/CASSANDRA-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878088#comment-15878088
 ] 

Stefania commented on CASSANDRA-12213:
--------------------------------------

Changing the flushing order might work because the schema version has not been 
updated yet. Therefore, even if the schema is not correct after the CL replay, 
it will pull the latest version from another node. From what I understood, it 
should be able to deal with any differences regardless of how much the schema 
has changed in the meantime, because it fetches the affected keyspaces again. 

Ideally, on startup we should reject any schema records that do not match the 
latest known schema version, but we cannot do this unless we add the schema 
version to every row in the schema tables. We could maybe rely on the timestamp 
of the records though, if the record was added after the latest version record, 
we could infer that the schema tables are in an inconsistent state and reject 
such records. Not sure how dangerous this is compared to just changing the 
flushing order.

I've also looked at separating CL replay for system and non-system tables, it 
doesn't seem like a trivial thing to do, unless we replay CL twice and handle 
CF not found exceptions, which I really don't like.

As expected, I could not reproduce the problem with the [multiplexed 
run|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/385/]
 for {{TestWriteFailures}}, but I could reproduce it once with a 
[run|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/386/]
 for {{TestCommitLog.test_commitlog_replay_on_startup}}.

Therefore, I'm running another [multiplexed 
run|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/387/]
 for {{TestCommitLog.test_commitlog_replay_on_startup}} with this patch:

||3.0||3.11||trunk||
|[patch|https://github.com/stef1927/cassandra/tree/12213-3.0]|[patch|https://github.com/stef1927/cassandra/tree/12213-3.11]|[patch|https://github.com/stef1927/cassandra/tree/12213]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-3.11-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-3.11-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-dtest/]|

I've basically added a new list that specifies the tables in flushing order, we 
cannot simply change the existing list because we want to keep the opposite 
order when truncating, and possibly for other operations too.

> dtest failure in write_failures_test.TestWriteFailures.test_paxos_any
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-12213
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12213
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Craig Kodman
>            Assignee: Stefania
>              Labels: dtest
>             Fix For: 3.11.x
>
>         Attachments: jenkins-stef1927-12014-dtest-2_logs.001.tar.gz, 
> node1_debug.log, node1_gc.log, node1.log, node2_debug.log, node2_gc.log, 
> node2.log, node3_debug.log, node3_gc.log, node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/10/testReport/write_failures_test/TestWriteFailures/test_paxos_any
> and:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/10/testReport/write_failures_test/TestWriteFailures/test_mutation_v3/
> Failed on CassCI build cassandra-3.9_dtest #10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to