[ 
https://issues.apache.org/jira/browse/CASSANDRA-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-8853:
---------------------------------------
    Attachment: 8853.txt

The intermittent test failures are caused by a race between gossip and setting 
up the system_auth keyspace in StorageService#doAuthSetup.

A peer can be marked alive by gossip and a schema migration completed in 
between checking for existence of the system_auth ks
and the call to MigrationManager#announceNewKeyspace. I've verified this 
behaviour by adding additional debug logging then reproduced it by hacking in 
an IEndpointStageChangeSubscriber to wait on an onAlive event during 
doAuthSetup. Likewise, the migration can also be interleaved with the checks 
for tables in system_auth.

The changes required to make a consistently reproducible test would be rather 
invasive, so the attached patch only includes the trivial fix. Theorectically, 
this could also affect the creation of the system_traces keyspace (I imagine we 
haven't seen this simply because that happens slightly earlier in node 
initialization) so I've pre-emptively applied the fix there too.

> adding existing table at node startup
> -------------------------------------
>
>                 Key: CASSANDRA-8853
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8853
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Ubuntu under VirtualBox; 2 and 4GB memory
>            Reporter: Jim Witschey
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0
>
>         Attachments: 8853.txt
>
>
> I get intermittent failures running 
> [putget_test.TestPutGet|https://github.com/riptano/cassandra-dtest/blob/master/putget_test.py#L11]
>  on trunk. The core of the failure is
> {code}
> Cannot add already existing table "resource_role_permissons_index" to 
> keyspace "system_auth"
> {code}
> I'll put in some time today seeing if it fails on previous versions.
> Here are two gists with the stdout and stderr from failing runs:
> https://gist.github.com/mambocab/b724a2c697416f21a621
> https://gist.github.com/mambocab/adb5cb90c14cda5f87c8
> Each of those were in an Ubuntu VM running under VirtualBox with 2 GB memory. 
> Here's a third that reproduced with 4GB:
> https://gist.github.com/mambocab/02ffa977eae2b5c3432b
> and here are the same for a successful run:
> https://gist.github.com/mambocab/de2a089e93bc4dff61cc
> There's some noise about reading JMX metrics in the Java stack traces that 
> can be ignored for this issue. This is in the traces for both failing runs, 
> and not in the trace for the successful one:
> {code}
> java.lang.AssertionError: 
> org.apache.cassandra.exceptions.AlreadyExistsException: Cannot add already 
> existing table "resource_role_permissons_index" to keyspace "system_auth"
>     at 
> org.apache.cassandra.service.StorageService.doAuthSetup(StorageService.java:897)
>     at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:832)
>     at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:579)
>     at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:469)
>     at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:357)
>     at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:492)
>     at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:599)
> Caused by: org.apache.cassandra.exceptions.AlreadyExistsException: Cannot add 
> already existing table "resource_role_permissons_index" to keyspace 
> "system_auth"
>     at 
> org.apache.cassandra.service.MigrationManager.announceNewColumnFamily(MigrationManager.java:286)
>     at 
> org.apache.cassandra.service.MigrationManager.announceNewColumnFamily(MigrationManager.java:275)
>     at 
> org.apache.cassandra.service.StorageService.doAuthSetup(StorageService.java:891)
>     ... 6 more
> {code}
> The test command is
> {code}
> CASSANDRA_DIR=~/cstar_src/cassandra PRINT_DEBUG=true nosetests -x -s -v 
> putget_test:TestPutGet >~/putget_test.stdout 2>~/putget_test.stderr
> {code}
> I'm running in Ubuntu under VirtualBox, which may be the problem:
> {code}
> $ uname -a
> Linux dtest-VirtualBox 3.13.0-45-generic #74-Ubuntu SMP Tue Jan 13 19:36:28 
> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> {code}
> dtest discussion [here|https://github.com/riptano/cassandra-dtest/issues/170].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to