[ 
https://issues.apache.org/jira/browse/CASSANDRA-12856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794639#comment-15794639
 ] 

Stefania commented on CASSANDRA-12856:
--------------------------------------

I'm trying to reproduce the problem one more time 
[here|https://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-dtest-multiplex/95/].

Other than the exception above, there isn't anything else in the logs. 
Therefore, it's not possible to know for sure what caused the problem unless we 
can reproduce it but, from source code inspection, this 
[line|https://github.com/apache/cassandra/blob/cassandra-3.X/src/java/org/apache/cassandra/thrift/CustomTThreadPoolServer.java#L101]
 would explain the problem, if  
[{{stop()}}|https://github.com/apache/cassandra/blob/cassandra-3.X/src/java/org/apache/cassandra/thrift/CustomTThreadPoolServer.java#L159]
 is called before the thrift server thread has executed it. It doesn't look 
like it's needed either, since {{serve()}} is only called once. The test that 
fails, starts and stops the node immediately, so this is a valid possibility.

I've prepared the patch for the following branches, I've skipped trunk because 
Thirft was removed and 3.11 because it is very similar to 3.X:

||2.1||2.2||3.0||3.X||
|[patch|https://github.com/stef1927/cassandra/tree/12856-2.1]|[patch|https://github.com/stef1927/cassandra/tree/12856-2.2]|[patch|https://github.com/stef1927/cassandra/tree/12856-3.0]|[patch|https://github.com/stef1927/cassandra/tree/12856-3.X]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12856-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12856-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12856-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12856-3.X-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12856-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12856-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12856-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12856-3.X-dtest/]|

The 2.1 patch applies cleanly on all branches, CI is currently running on 2.1 
only. I'm not sure which branches this patch should be committed to: the race 
is extremely rare but the consequences are quite bad (a thread looping forever).


> dtest failure in 
> replication_test.SnitchConfigurationUpdateTest.test_cannot_restart_with_different_rack
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12856
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12856
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sean McCarthy
>            Assignee: Stefania
>              Labels: dtest, test-failure
>         Attachments: node1.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_novnode_dtest/280/testReport/replication_test/SnitchConfigurationUpdateTest/test_cannot_restart_with_different_rack
> {code}
> Error Message
> Problem stopping node node1
> {code}{code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/automaton/cassandra-dtest/replication_test.py", line 630, in 
> test_cannot_restart_with_different_rack
>     node1.stop(wait_other_notice=True)
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 727, in 
> stop
>     raise NodeError("Problem stopping node %s" % self.name)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to