[ 
https://issues.apache.org/jira/browse/CASSANDRA-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Hust updated CASSANDRA-10250:
------------------------------------
    Description: 
A recently added 
[dtest|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/TestConcurrentSchemaChanges/create_lots_of_schema_churn_test/]
 has been flapping on cassci and has exposed an issue with running lots of 
schema alterations concurrently.  The failures occur on healthy clusters but 
seem to occur at higher rates when 1 node is down during the alters.

The test executes the following – 440 total commands:
-       Create 20 new tables
-       Drop 7 columns one at time across 20 tables
-       Add 7 columns on at time across 20 tables
-       Add one column index on each of the 7 columns on 20 tables

Outcome is random. Majority of the failures are dropped columns still being 
present, but new columns and indexes have been observed to be incorrect.  The 
logs are don’t have exceptions and the columns/indexes that are incorrect don’t 
seem to follow a pattern.  Running a {{nodetool describecluster}} on each node 
shows the same schema id on all nodes.

Attached is a python script extracted from the dtest.  Running against a local 
3 node cluster will reproduce the issue (with enough runs – fails ~20% on my 
machine).

Also attached is the node logs from a run with when a dropped column 
(alter_me_7 table, column s1) is still present.  Checking the system_schema 
tables for this case shows the s1 column in both the columns and drop_columns 
tables.

This has been flapping on cassci on versions 2+ and doesn’t seem to be related 
to changes in 3.0.  More testing needs to be done though.

//cc [~enigmacurry]

  was:
A recently added 
[dtest|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/TestConcurrentSchemaChanges/create_lots_of_schema_churn_test/]
 has been flapping on cassci and has exposed an issue with running lots of 
schema alterations concurrently.  The failures occur on healthy clusters but 
seem to occur at higher rates when 1 node is down during the alters.

The test executes the following – 440 total commands:
-       Create 20 new tables
-       Drop 7 columns one at time across 20 tables
-       Add 7 columns on at time across 20 tables
-       Add one column index on each of the 7 columns on 20 tables

Outcome is random. Majority of the failures are dropped columns still being 
present, but new columns and indexes have been observed to be incorrect.  The 
logs are don’t have exceptions and the columns/indexes that are incorrect don’t 
seem to follow a pattern.  Running a {{nodetool describecluster}} on each node 
shows the same schema id on all nodes.

Attached is a python script extracted from the dtest.  Running against a local 
3 node cluster will reproduce the issue (with enough runs – fails ~20% on my 
machine).

Also attached is the node logs from a run with when a dropped column 
(alter_me_7 table, column s1) is still present.  Checking the system_schema 
tables for this case shows the s1 column in both the columns and drop_columns 
tables.

This has been flapping on cassci on versions 2+ and doesn’t seem to be related 
to changes in 3.0.  More testing needs to be done though.



> Executing lots of schema alters concurrently can lead to dropped alters
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-10250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10250
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Andrew Hust
>         Attachments: concurrent_schema_changes.py, node1.log, node2.log, 
> node3.log
>
>
> A recently added 
> [dtest|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest/132/testReport/junit/concurrent_schema_changes_test/TestConcurrentSchemaChanges/create_lots_of_schema_churn_test/]
>  has been flapping on cassci and has exposed an issue with running lots of 
> schema alterations concurrently.  The failures occur on healthy clusters but 
> seem to occur at higher rates when 1 node is down during the alters.
> The test executes the following – 440 total commands:
> -     Create 20 new tables
> -     Drop 7 columns one at time across 20 tables
> -     Add 7 columns on at time across 20 tables
> -     Add one column index on each of the 7 columns on 20 tables
> Outcome is random. Majority of the failures are dropped columns still being 
> present, but new columns and indexes have been observed to be incorrect.  The 
> logs are don’t have exceptions and the columns/indexes that are incorrect 
> don’t seem to follow a pattern.  Running a {{nodetool describecluster}} on 
> each node shows the same schema id on all nodes.
> Attached is a python script extracted from the dtest.  Running against a 
> local 3 node cluster will reproduce the issue (with enough runs – fails ~20% 
> on my machine).
> Also attached is the node logs from a run with when a dropped column 
> (alter_me_7 table, column s1) is still present.  Checking the system_schema 
> tables for this case shows the s1 column in both the columns and drop_columns 
> tables.
> This has been flapping on cassci on versions 2+ and doesn’t seem to be 
> related to changes in 3.0.  More testing needs to be done though.
> //cc [~enigmacurry]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to