[
https://issues.apache.org/jira/browse/CASSANDRA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-1463:
--------------------------------------
Assignee: Jonathan Ellis
Fix Version/s: 0.6.6
Affects Version/s: 0.6
(was: 0.6.5)
> Failed bootstrap can cause NPE in batch_mutate on every node, taking down the
> entire cluster
> --------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-1463
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1463
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: David King
> Assignee: Jonathan Ellis
> Fix For: 0.6.6
>
>
> In adding a node to the cluster, the bootstrap failed (still investigating
> the cause). An hour later, the entire cluster failed, preventing any writes
> from being accepted. This exception started being printed to the logs:
> {quote}
> INFO [Timer-0] 2010-09-03 12:23:33,282 Gossiper.java (line 402) FatClient
> /10.251.243.191 has been silent for 3600000ms, removing from gossip
> ERROR [Timer-0] 2010-09-03 12:23:33,318 Gossiper.java (line 99) Gossip error
> java.util.ConcurrentModificationException
> at java.util.Hashtable$Enumerator.next(Hashtable.java:1048)
> at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:383)
> at
> org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:93)
> at java.util.TimerThread.mainLoop(Timer.java:534)
> at java.util.TimerThread.run(Timer.java:484)
> ERROR [pool-1-thread-69153] 2010-09-03 12:23:33,857 Cassandra.java (line
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> at
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:135)
> at
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:85)
> at
> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
> at
> org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:415)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:1651)
> at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1166)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
> ERROR [pool-1-thread-69154] 2010-09-03 12:23:33,869 Cassandra.java (line
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> at
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:135)
> at
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:85)
> at
> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
> at
> org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:415)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:1651)
> at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1166)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
> {quote}
> After a large number of iterations of that (at least thousands), the printed
> exception was shortened (this shortening is what made me mistakenly file
> #1462) to
> {quote}
> ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,857 Cassandra.java (line
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,883 Cassandra.java (line
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,894 Cassandra.java (line
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> ERROR [pool-1-thread-68970] 2010-09-03 12:39:22,985 Cassandra.java (line
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> ERROR [pool-1-thread-68970] 2010-09-03 12:39:23,084 Cassandra.java (line
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> {quote}
> Rolling a restart over the cluster fixed it, but every node had to be
> restarted before it started accepting writes again.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.