[
https://issues.apache.org/jira/browse/CASSANDRA-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876328#comment-15876328
]
Stefania commented on CASSANDRA-12213:
--------------------------------------
I'm trying to reproduce the problem
[here|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/385/]
but I suspect it's going to be quite hard to reproduce.
>From the analysis of the log files attached to this ticket on Sep 22 and July
>15, in both cases the node was stopped whilst it was creating a user table and
>flushing the schema tables. Afterwards, the node fails to start. The logs are
>not consistent as to which schema tables were flushed exactly, but DEBUG
>messages are logged asynchronously and so some may be missing. My theory is
>that {{keyspaces}} and {{tables}} were flushed, but {{columns}} was not.
In {{SchemaKeyspace.flush()}}, the tables are flushed sequentially, perhaps
they should be flushed in parallel:
{code}
static void flush()
{
if (!Boolean.getBoolean("cassandra.unsafesystem"))
ALL.forEach(table ->
FBUtilities.waitOnFuture(getSchemaCFS(table).forceFlush()));
}
{code}
The shutdown hook also flushes all system tables. It would not have run for the
test that reproduced it on Sep 22, since it uses a {{kill -9}}, but it should
have run for {{TestWriteFailures}} since this test uses a gentle stop. However,
node 3 (the one with the assertion) did not announce shutdown on Gossip, and
got convicted by other nodes, so I am guessing that for some unknown reason the
shutdown hook did not run.
On startup, the schema is loaded before replaying the commit log, so assuming
indeed the {{columns}} table had not been flushed, then we have an explanation.
Regardless of the shutdown hook, I think we should load the schema after
recovering the commit log if possible, at least the CL for the system tables.
[~iamaleksey], [~thobbs] WDYT?
> dtest failure in write_failures_test.TestWriteFailures.test_paxos_any
> ---------------------------------------------------------------------
>
> Key: CASSANDRA-12213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12213
> Project: Cassandra
> Issue Type: Bug
> Reporter: Craig Kodman
> Assignee: Stefania
> Labels: dtest
> Fix For: 3.11.x
>
> Attachments: jenkins-stef1927-12014-dtest-2_logs.001.tar.gz,
> node1_debug.log, node1_gc.log, node1.log, node2_debug.log, node2_gc.log,
> node2.log, node3_debug.log, node3_gc.log, node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/10/testReport/write_failures_test/TestWriteFailures/test_paxos_any
> and:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/10/testReport/write_failures_test/TestWriteFailures/test_mutation_v3/
> Failed on CassCI build cassandra-3.9_dtest #10
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)