[
https://issues.apache.org/jira/browse/CASSANDRA-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878088#comment-15878088
]
Stefania commented on CASSANDRA-12213:
--------------------------------------
Changing the flushing order might work because the schema version has not been
updated yet. Therefore, even if the schema is not correct after the CL replay,
it will pull the latest version from another node. From what I understood, it
should be able to deal with any differences regardless of how much the schema
has changed in the meantime, because it fetches the affected keyspaces again.
Ideally, on startup we should reject any schema records that do not match the
latest known schema version, but we cannot do this unless we add the schema
version to every row in the schema tables. We could maybe rely on the timestamp
of the records though, if the record was added after the latest version record,
we could infer that the schema tables are in an inconsistent state and reject
such records. Not sure how dangerous this is compared to just changing the
flushing order.
I've also looked at separating CL replay for system and non-system tables, it
doesn't seem like a trivial thing to do, unless we replay CL twice and handle
CF not found exceptions, which I really don't like.
As expected, I could not reproduce the problem with the [multiplexed
run|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/385/]
for {{TestWriteFailures}}, but I could reproduce it once with a
[run|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/386/]
for {{TestCommitLog.test_commitlog_replay_on_startup}}.
Therefore, I'm running another [multiplexed
run|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/387/]
for {{TestCommitLog.test_commitlog_replay_on_startup}} with this patch:
||3.0||3.11||trunk||
|[patch|https://github.com/stef1927/cassandra/tree/12213-3.0]|[patch|https://github.com/stef1927/cassandra/tree/12213-3.11]|[patch|https://github.com/stef1927/cassandra/tree/12213]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-3.11-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-3.11-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12213-dtest/]|
I've basically added a new list that specifies the tables in flushing order, we
cannot simply change the existing list because we want to keep the opposite
order when truncating, and possibly for other operations too.
> dtest failure in write_failures_test.TestWriteFailures.test_paxos_any
> ---------------------------------------------------------------------
>
> Key: CASSANDRA-12213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12213
> Project: Cassandra
> Issue Type: Bug
> Reporter: Craig Kodman
> Assignee: Stefania
> Labels: dtest
> Fix For: 3.11.x
>
> Attachments: jenkins-stef1927-12014-dtest-2_logs.001.tar.gz,
> node1_debug.log, node1_gc.log, node1.log, node2_debug.log, node2_gc.log,
> node2.log, node3_debug.log, node3_gc.log, node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/10/testReport/write_failures_test/TestWriteFailures/test_paxos_any
> and:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/10/testReport/write_failures_test/TestWriteFailures/test_mutation_v3/
> Failed on CassCI build cassandra-3.9_dtest #10
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)