[ 
https://issues.apache.org/jira/browse/CASSANDRA-16588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322315#comment-17322315
 ] 

Brandon Williams commented on CASSANDRA-16588:
----------------------------------------------

Your 3.11 build aborted for some reason, I started it again. 
[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/666/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/666/pipeline]

I was a bit concerned that we could get a valid shadow ack where our own IP was 
somehow missing HOST_ID, causing us to stay in shadow forever and fail startup. 
 In fact, our empty state from CASSANDRA-16561 would do this if not for the 
fact that since the generation and version are zero, the seed will filter 
sending the empty state out (by luck of neither generation nor version being 
perceived as changed.) So I opted for detecting the bad ack as specifically as 
possible.

That said, there may be other unintentional bad responses possible here that 
your patch would catch and prevent the NPE.  I'm not sure which route is best.






> NPE getting host_id in Gossiper.isSafeForStartup
> ------------------------------------------------
>
>                 Key: CASSANDRA-16588
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16588
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>            Priority: Normal
>             Fix For: 3.11.x, 4.0-rc
>
>
> As seen here: 
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/604/testReport/junit/org.apache.cassandra.distributed.upgrade/MixedModeGossipTest/testStatusFieldShouldExistInOldVersionNodesEdgeCase/
> {noformat}
> java.lang.NullPointerException
>       at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:952)
>       at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:657)
>       at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:933)
>       at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>       at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>       at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:541)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I believe what is happening is a GossipDigestAck has been queued to ack the 
> shutdown state from the node on the seed, but isn't actually sent until the 
> node has restarted and gone into shadow.  Since the ack contains the node's 
> IP, it assumes a host_id will be there but since this is not an actual shadow 
> response, it is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to