[ 
https://issues.apache.org/jira/browse/CASSANDRA-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406751#comment-16406751
 ] 

Sam Tunnicliffe commented on CASSANDRA-14155:
---------------------------------------------

I'm not sure that the scenario above can happen quite as described. When 
\{{loadRingState}} adds the endpoints to \{{endpointStateMap}} they're created 
with a brand new \{{HeartBeatState}}, one with \{{(generation, version) == (0, 
0)}}. In \{{Gossiper::examineGossiper}}, the empty digest list in a shadow SYN 
is replaced with a list containing one digest for every known endpoint and 
these are also initialized with {{(0,0)}}. So if a node were to finish its 
shadow round, load ring state, start gossip and immediately receive a shadow 
round SYN from a peer, it would not include any state for that peer as the 
generation/version in the digest would match the one in the local epState. 

Of course though, the stacktrace in the description certainly indicates that 
the epStates map obtained from the shadow round did contain a state for the 
node in question and that its {{HOST_ID}} appState is missing. So I'm all for 
adding the check & assertion error in {{isSafeForStartup}}, although I think we 
ought to log more detail here, probably the epStates map in its entireity. I'm 
less comfortable with changing the behaviour of the shadow round if we're not 
really clear on what's causing it. As we've only seen this sporadically in 
tests, how do you feel about adding the assertion (& any other error logging 
that may be useful) and seeing if that helps us track down the cause if/when we 
see the error in future test runs? My fear is that this is a symptom of a more 
pernicious race like the ones in CASSANDRA-13700 & CASSANDRA-11825.

> [TRUNK] Gossiper somewhat frequently hitting an NPE on node startup with 
> dtests at 
> org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14155
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14155
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michael Kjellman
>            Assignee: Jason Brown
>            Priority: Major
>
> Gossiper is somewhat frequently hitting an NPE on node startup with dtests at 
> org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [main] 2018-01-08 21:41:01,832 CassandraDaemon.java:675 - Exception 
> encountered during startup
> java.lang.NullPointerException: null
>         at 
> org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769) 
> ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:511)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:761)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:621)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:568)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:360) 
> [main/:na]
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
>  [main/:na]
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:658) 
> [main/:na], ERROR [main] 2018-01-08 21:41:01,832 CassandraDaemon.java:675 - 
> Exception encountered during startup
> java.lang.NullPointerException: null
>         at 
> org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769) 
> ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:511)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:761)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:621)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:568)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:360) 
> [main/:na]
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
>  [main/:na]
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:658) 
> [main/:na]]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to