[ https://issues.apache.org/jira/browse/CASSANDRA-5815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788228#comment-13788228 ]
Chris Burroughs commented on CASSANDRA-5815: -------------------------------------------- I'm seeing an NPE in migration manager in 1.2.9 and what I think is the same spot (line numbers changed slightly since July). This occurs on at least one node every time (about 10 attempts) I try to bootstrap with a 2 dc production cluster using the GPFS w/ reconnecting. {noformat} ERROR [OptionalTasks:1] 2013-10-07 08:06:05,658 CassandraDaemon.java (line 194) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:130) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} I added a log message to confirm that Gossiper really really thinks it's not there (off of the 1.2.10 tag if that matters). I'm suspicious of this being a timing problem the reconnect dance, but I'm not sure how to prove or disprove that. {noformat} logger.warn("[csb] Trying to get endpoint state for {} ; exists {}", new Object[] {endpoint, Gossiper.instance.isKnownEndpoint(endpoint)}); INFO [GossipTasks:1] 2013-10-07 11:19:10,565 Gossiper.java (line 803) InetAddress /208.49.103.36 is now DOWN INFO [GossipTasks:1] 2013-10-07 11:19:13,572 Gossiper.java (line 608) FatClient /208.49.103.36 has been silent for 30000ms, removing from gossip INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:13,863 OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36 INFO [HANDSHAKE-/208.49.103.36] 2013-10-07 11:19:15,275 OutboundTcpConnection.java (line 399) Handshaking version with /208.49.103.36 WARN [OptionalTasks:1] 2013-10-07 11:19:36,696 MigrationManager.java (line 130) [csb] Trying to get endpoint state for /208.49.103.36 ; exists false ERROR [OptionalTasks:1] 2013-10-07 11:19:36,696 CassandraDaemon.java (line 193) Exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:131) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} > NPE from migration manager > -------------------------- > > Key: CASSANDRA-5815 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5815 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.1.12 > Reporter: Vishy Kasar > Assignee: Brandon Williams > Priority: Minor > > In one of our production clusters we see this error often. Looking through > the source, Gossiper.instance.getEndpointStateForEndpoint(endpoint) is > returning null for some end point. De we need any config change on our end to > resolve this? In any case, cassandra should be updated to protect against > this NPE. > ERROR [OptionalTasks:1] 2013-07-24 13:40:38,972 AbstractCassandraDaemon.java > (line 132) Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.NullPointerException > at > org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:134) > > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > It turned out that the reason for NPE was we bootstrapped a node with the > same token as another node. Cassandra should not throw an NPE here but log a > meaningful error message. -- This message was sent by Atlassian JIRA (v6.1#6144)