[ 
https://issues.apache.org/jira/browse/CASSANDRA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1463:
--------------------------------------

    Attachment:     (was: 1463-v2.txt)

> Failed bootstrap can cause NPE in batch_mutate on every node, taking down the 
> entire cluster
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1463
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1463
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: David King
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.6, 0.7 beta 2
>
>         Attachments: 1463.txt
>
>
> In adding a node to the cluster, the bootstrap failed (still investigating 
> the cause). An hour later, the entire cluster failed, preventing any writes 
> from being accepted. This exception started being printed to the logs:
> {quote}
>  INFO [Timer-0] 2010-09-03 12:23:33,282 Gossiper.java (line 402) FatClient 
> /10.251.243.191 has been silent for 3600000ms, removing from gossip
> ERROR [Timer-0] 2010-09-03 12:23:33,318 Gossiper.java (line 99) Gossip error
> java.util.ConcurrentModificationException
>         at java.util.Hashtable$Enumerator.next(Hashtable.java:1048)
>         at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:383)
>         at 
> org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:93)
>         at java.util.TimerThread.mainLoop(Timer.java:534)
>         at java.util.TimerThread.run(Timer.java:484)
> ERROR [pool-1-thread-69153] 2010-09-03 12:23:33,857 Cassandra.java (line 
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
>         at 
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:135)
>         at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:85)
>         at 
> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
>         at 
> org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:415)
>         at 
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:1651)
>         at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1166)
>         at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:636)
> ERROR [pool-1-thread-69154] 2010-09-03 12:23:33,869 Cassandra.java (line 
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
>         at 
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:135)
>         at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:85)
>         at 
> org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:204)
>         at 
> org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:415)
>         at 
> org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:1651)
>         at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1166)
>         at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:636)
> {quote}
> After a large number of iterations of that (at least thousands), the printed 
> exception was shortened (this shortening is what made me mistakenly file 
> #1462) to
> {quote}
> ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,857 Cassandra.java (line 
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,883 Cassandra.java (line 
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> ERROR [pool-1-thread-68869] 2010-09-03 12:39:22,894 Cassandra.java (line 
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> ERROR [pool-1-thread-68970] 2010-09-03 12:39:22,985 Cassandra.java (line 
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> ERROR [pool-1-thread-68970] 2010-09-03 12:39:23,084 Cassandra.java (line 
> 1659) Internal error processing batch_mutate
> java.lang.NullPointerException
> {quote}
> Rolling a restart over the cluster fixed it, but every node had to be 
> restarted before it started accepting writes again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to