[
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877334#comment-14877334
]
Stefania commented on CASSANDRA-10231:
--------------------------------------
I've attached [a patch|https://github.com/stef1927/cassandra/commits/10231-3.0]
for 3.0 that fixes the dtest, [~jkni] would you mind trying the patch with your
Jepsen test? It will also log a message for all GOSSIP entries so the log files
may get a bit bigger but we will have helpful information should the patch not
work.
The patch basically fixes this exception, which causes any other GOSSIP
properties applied by {{onChange}} and following STATUS to be skipped:
{code}
ERROR [GossipStage:2] 2015-09-19 14:14:31,007 CassandraDaemon.java:195 -
Exception in thread Thread[GossipStage:2,5,main]
java.lang.NullPointerException: null
at
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
~[na:1.8.0_60]
at org.apache.cassandra.hints.HintsCatalog.get(HintsCatalog.java:85)
~[main/:na]
at
org.apache.cassandra.hints.HintsService.excise(HintsService.java:263)
~[main/:na]
at
org.apache.cassandra.service.StorageService.excise(StorageService.java:2166)
~[main/:na]
at
org.apache.cassandra.service.StorageService.excise(StorageService.java:2178)
~[main/:na]
at
org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2083)
~[main/:na]
at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1672)
~[main/:na]
at
org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1220)
~[main/:na]
at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1202)
~[main/:na]
at
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1159)
~[main/:na]
at
org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
~[main/:na]
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
~[main/:na]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_60]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
{code}
According to the additional GOSSIP trace message that I've added, host id was
one such property.
> Null status entries on nodes that crash during decommission of a different
> node
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-10231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
> Project: Cassandra
> Issue Type: Bug
> Reporter: Joel Knighton
> Assignee: Stefania
> Fix For: 3.0.x
>
>
> This issue is reproducible through a Jepsen test of materialized views that
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during
> the decommission of a different node, it may start with a null entry for the
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon
> a restart of the affected node.
> This issue is further detailed in ticket
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)