[ 
https://issues.apache.org/jira/browse/CASSANDRA-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344785#comment-14344785
 ] 

Stefania commented on CASSANDRA-7816:
-------------------------------------

It's quite easy to reproduce, I added a new test, {{restart_node_test}} to 
pushed_notifications_test.py, available in this pull request: 
https://github.com/riptano/cassandra-dtest/pull/177.

There are always two DOWN notifications, and this is deterministic. They are 
generated by:

{code}
INFO  [GossipStage:1] 2015-03-03 01:10:47,156 Server.java:413 - 
Thread[GossipStage:1,5,main]
        at java.lang.Thread.getStackTrace(Thread.java:1589)
        at 
org.apache.cassandra.transport.Server$EventNotifier.getStackTrace(Server.java:396)
        at 
org.apache.cassandra.transport.Server$EventNotifier.onDown(Server.java:413)
        at 
org.apache.cassandra.service.StorageService.onDead(StorageService.java:2049)
        at org.apache.cassandra.gms.Gossiper.markDead(Gossiper.java:932)
        at org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:319)
        at 
org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:251)
        at 
org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

and 

{code}
INFO  [GossipStage:1] 2015-03-03 01:11:04,254 Server.java:413 - 
Thread[GossipStage:1,5,main]
        at java.lang.Thread.getStackTrace(Thread.java:1589)
        at 
org.apache.cassandra.transport.Server$EventNotifier.getStackTrace(Server.java:396)
        at 
org.apache.cassandra.transport.Server$EventNotifier.onDown(Server.java:413)
        at 
org.apache.cassandra.service.StorageService.onDead(StorageService.java:2049)
        at 
org.apache.cassandra.service.StorageService.onRestart(StorageService.java:2057)
        at 
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:958)
        at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1024)
        at 
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

There are one or more UP notifications, and this is not deterministic but it 
tends to happen on the third time the node is restarted. They are generated by 
the same stack trace but different threads indicating a contention problem, to 
be investigated further:

{code}
INFO  [SharedPool-Worker-2] 2015-03-03 01:11:04,419 Gossiper.java:916 - 
InetAddress /127.0.0.2 is now UP
INFO  [SharedPool-Worker-2] 2015-03-03 01:11:04,421 Server.java:407 - 
Thread[SharedPool-Worker-2,10,main]
        at java.lang.Thread.getStackTrace(Thread.java:1589)
        at 
org.apache.cassandra.transport.Server$EventNotifier.getStackTrace(Server.java:396)
        at 
org.apache.cassandra.transport.Server$EventNotifier.onUp(Server.java:407)
        at 
org.apache.cassandra.service.StorageService.onAlive(StorageService.java:2033)
        at org.apache.cassandra.gms.Gossiper.realMarkAlive(Gossiper.java:918)
        at org.apache.cassandra.gms.Gossiper.access$900(Gossiper.java:67)
        at org.apache.cassandra.gms.Gossiper$2.response(Gossiper.java:900)
        at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:54)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
        at java.lang.Thread.run(Thread.java:745)
{code}

Sample output of the test (with assertions commented out):

{code}
KEEP_LOGS=true PRINT_DEBUG=true nosetests -s -a 'selected' 
pushed_notifications_test.py
cluster ccm directory: /tmp/dtest-AQzO0X
Restarting second node...
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Waiting for notifications from 127.0.0.1
Restarting second node...
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Waiting for notifications from 127.0.0.1
Restarting second node...
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent DOWN for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Source 127.0.0.1 sent UP for 127.0.0.2
Waiting for notifications from 127.0.0.1
removing ccm cluster test at: /tmp/dtest-AQzO0X
.
----------------------------------------------------------------------
Ran 1 test in 94.861s

OK
{code}

> Updated the "4.2.6. EVENT" section in the binary protocol specification
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-7816
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7816
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Documentation & website
>            Reporter: Michael Penick
>            Assignee: Stefania
>            Priority: Trivial
>         Attachments: tcpdump_repeating_status_change.txt, trunk-7816.txt
>
>
> Added "MOVED_NODE" as a possible type of topology change and also specified 
> that it is possible to receive the same event multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to