[ 
https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738061#comment-14738061
 ] 

Stefania commented on CASSANDRA-10089:
--------------------------------------

I spent some more time debugging GOSSIP updates when nodes are restarted by 
adding additional debug messages. It's not that _we remove tokens in status 
shutdown_, rather when a node restarts there is a window where we have received 
a major Gossip update with no tokens, followed by delta updates also with no 
tokens or status, followed eventually by a delta or full update with status 
NORMAL and some tokens. My guess is that we mark the node as SHUTDOWN during 
this window when we have no tokens. Then the next time we get a delta without 
tokens, we crash because we are in status SHUTDOWN with no tokens.

Locally on my box, the node is always marked as SHUTDOWN by the Gossip shutdown 
verb handler, so before the first major update with no tokens and therefore I 
cannot reproduce it. I am going to run on Jenkins a branch with these 
additional debug messages to see if I can gather some more information. 

bq. Maybe we should extend the user-level error to cover "null or empty" and 
not just null.

So we would log at ERROR level if {{handleStateNormal}} is called with no 
tokens?

> NullPointerException in Gossip handleStateNormal
> ------------------------------------------------
>
>                 Key: CASSANDRA-10089
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10089
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Whilst comparing dtests for CASSANDRA-9970 I found [this failing 
> dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/]
>  in 2.2:
> {code}
> Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 
> 15:39:57,873 CassandraDaemon.java:183 - Exception in thread 
> Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat 
> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  ~[main/:na] \tat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[main/:na] \tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_80] \tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]']
> {code}
> I wasn't able to find it on unpatched branches  but it is clearly not related 
> to CASSANDRA-9970, if anything it could have been a side effect of 
> CASSANDRA-9871.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to