[
https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738061#comment-14738061
]
Stefania commented on CASSANDRA-10089:
--------------------------------------
I spent some more time debugging GOSSIP updates when nodes are restarted by
adding additional debug messages. It's not that _we remove tokens in status
shutdown_, rather when a node restarts there is a window where we have received
a major Gossip update with no tokens, followed by delta updates also with no
tokens or status, followed eventually by a delta or full update with status
NORMAL and some tokens. My guess is that we mark the node as SHUTDOWN during
this window when we have no tokens. Then the next time we get a delta without
tokens, we crash because we are in status SHUTDOWN with no tokens.
Locally on my box, the node is always marked as SHUTDOWN by the Gossip shutdown
verb handler, so before the first major update with no tokens and therefore I
cannot reproduce it. I am going to run on Jenkins a branch with these
additional debug messages to see if I can gather some more information.
bq. Maybe we should extend the user-level error to cover "null or empty" and
not just null.
So we would log at ERROR level if {{handleStateNormal}} is called with no
tokens?
> NullPointerException in Gossip handleStateNormal
> ------------------------------------------------
>
> Key: CASSANDRA-10089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10089
> Project: Cassandra
> Issue Type: Bug
> Reporter: Stefania
> Assignee: Stefania
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Whilst comparing dtests for CASSANDRA-9970 I found [this failing
> dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/]
> in 2.2:
> {code}
> Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14
> 15:39:57,873 CassandraDaemon.java:183 - Exception in thread
> Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat
> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731)
> ~[main/:na] \tat
> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804)
> ~[main/:na] \tat
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857)
> ~[main/:na] \tat
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629)
> ~[main/:na] \tat
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312)
> ~[main/:na] \tat
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025)
> ~[main/:na] \tat
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106)
> ~[main/:na] \tat
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
> ~[main/:na] \tat
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> ~[main/:na] \tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_80] \tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]']
> {code}
> I wasn't able to find it on unpatched branches but it is clearly not related
> to CASSANDRA-9970, if anything it could have been a side effect of
> CASSANDRA-9871.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)