[
https://issues.apache.org/jira/browse/CASSANDRA-14155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406751#comment-16406751
]
Sam Tunnicliffe commented on CASSANDRA-14155:
---------------------------------------------
I'm not sure that the scenario above can happen quite as described. When
\{{loadRingState}} adds the endpoints to \{{endpointStateMap}} they're created
with a brand new \{{HeartBeatState}}, one with \{{(generation, version) == (0,
0)}}. In \{{Gossiper::examineGossiper}}, the empty digest list in a shadow SYN
is replaced with a list containing one digest for every known endpoint and
these are also initialized with {{(0,0)}}. So if a node were to finish its
shadow round, load ring state, start gossip and immediately receive a shadow
round SYN from a peer, it would not include any state for that peer as the
generation/version in the digest would match the one in the local epState.
Of course though, the stacktrace in the description certainly indicates that
the epStates map obtained from the shadow round did contain a state for the
node in question and that its {{HOST_ID}} appState is missing. So I'm all for
adding the check & assertion error in {{isSafeForStartup}}, although I think we
ought to log more detail here, probably the epStates map in its entireity. I'm
less comfortable with changing the behaviour of the shadow round if we're not
really clear on what's causing it. As we've only seen this sporadically in
tests, how do you feel about adding the assertion (& any other error logging
that may be useful) and seeing if that helps us track down the cause if/when we
see the error in future test runs? My fear is that this is a symptom of a more
pernicious race like the ones in CASSANDRA-13700 & CASSANDRA-11825.
> [TRUNK] Gossiper somewhat frequently hitting an NPE on node startup with
> dtests at
> org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-14155
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14155
> Project: Cassandra
> Issue Type: Bug
> Reporter: Michael Kjellman
> Assignee: Jason Brown
> Priority: Major
>
> Gossiper is somewhat frequently hitting an NPE on node startup with dtests at
> org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors:
> [ERROR [main] 2018-01-08 21:41:01,832 CassandraDaemon.java:675 - Exception
> encountered during startup
> java.lang.NullPointerException: null
> at
> org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:511)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:761)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:621)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:568)
> ~[main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:360)
> [main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
> [main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:658)
> [main/:na], ERROR [main] 2018-01-08 21:41:01,832 CassandraDaemon.java:675 -
> Exception encountered during startup
> java.lang.NullPointerException: null
> at
> org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:769)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:511)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:761)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:621)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:568)
> ~[main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:360)
> [main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:569)
> [main/:na]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:658)
> [main/:na]]
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]