[ 
https://issues.apache.org/jira/browse/CASSANDRA-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876328#comment-15876328
 ] 

Stefania commented on CASSANDRA-12213:
--------------------------------------

I'm trying to reproduce the problem 
[here|https://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/385/]
 but I suspect it's going to be quite hard to reproduce.

>From the analysis of the log files attached to this ticket on Sep 22 and July 
>15, in both cases the node was stopped whilst it was creating a user table and 
>flushing the schema tables. Afterwards, the node fails to start. The logs are 
>not consistent as to which schema tables were flushed exactly, but DEBUG 
>messages are logged asynchronously and so some may be missing. My theory is 
>that {{keyspaces}} and {{tables}} were flushed, but {{columns}} was not. 

In {{SchemaKeyspace.flush()}}, the tables are flushed sequentially, perhaps 
they should be flushed in parallel:

{code}
    static void flush()
    {
        if (!Boolean.getBoolean("cassandra.unsafesystem"))
            ALL.forEach(table -> 
FBUtilities.waitOnFuture(getSchemaCFS(table).forceFlush()));
    }
{code}

The shutdown hook also flushes all system tables. It would not have run for the 
test that reproduced it on Sep 22, since it uses a {{kill -9}}, but it should 
have run for {{TestWriteFailures}} since this test uses a gentle stop. However, 
node 3 (the one with the assertion) did not announce shutdown on Gossip, and 
got convicted by other nodes, so I am guessing that for some unknown reason the 
shutdown hook did not run.

On startup, the schema is loaded before replaying the commit log, so assuming 
indeed the {{columns}} table had not been flushed, then we have an explanation.

Regardless of the shutdown hook, I think we should load the schema after 
recovering the commit log if possible, at least the CL for the system tables. 

[~iamaleksey], [~thobbs] WDYT?

> dtest failure in write_failures_test.TestWriteFailures.test_paxos_any
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-12213
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12213
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Craig Kodman
>            Assignee: Stefania
>              Labels: dtest
>             Fix For: 3.11.x
>
>         Attachments: jenkins-stef1927-12014-dtest-2_logs.001.tar.gz, 
> node1_debug.log, node1_gc.log, node1.log, node2_debug.log, node2_gc.log, 
> node2.log, node3_debug.log, node3_gc.log, node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/10/testReport/write_failures_test/TestWriteFailures/test_paxos_any
> and:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/10/testReport/write_failures_test/TestWriteFailures/test_mutation_v3/
> Failed on CassCI build cassandra-3.9_dtest #10



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to