I am also experiencing issues bootstrapping new nodes in my 2.0.10 Cassandra cluster. The first attempt to bootstrap ALWAYS fails, followed by a second bootstrap attempt that ALWAYS succeeds.
The first attempt at bootstrapping fails with: INFO [main] 2015-03-15 02:41:02,550 StorageService.java (line 966) JOINING: Starting to bootstrap... ERROR [main] 2015-03-15 02:41:02,872 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.IllegalStateException: unable to find sufficient sources for streaming range (7169067280919608187,7171404468239785904] at org.apache.cassandra.dht.RangeStreamer.getRangeFetchMap(RangeStreamer.java:201) at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:125) at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:72) at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:994) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:797) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:502) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585) After the failure, I 1. stop the node, 2. clear our the data/saved_caches/commitlog directories, and 3. remove the node from all peers (usually by manually deleting the node from their peers table) 4. restart the node to re-attempt bootstrap The bootstrap always seems to work on this second attempt. I tried comparing the logs from the failed bootstrap and the successful one, and the main difference I see is that the failed bootstrap contains many "unknown endpoint" lines: INFO [main] 2015-03-15 02:40:25,160 StorageService.java (line 966) JOINING: waiting for ring information INFO [HANDSHAKE-/10.30.30.30] 2015-03-15 02:40:25,175 OutboundTcpConnection.java (line 386) Handshaking version with /10.30.30.30 INFO [HANDSHAKE-/10.4.4.4] 2015-03-15 02:40:25,279 OutboundTcpConnection.java (line 386) Handshaking version with /10.4.4.4 INFO [HANDSHAKE-/10.20.20.20] 2015-03-15 02:40:25,383 OutboundTcpConnection.java (line 386) Handshaking version with /10.20.20.20 INFO [HANDSHAKE-/10.10.10.10] 2015-03-15 02:40:25,489 OutboundTcpConnection.java (line 386) Handshaking version with /10.10.10.10 INFO [HANDSHAKE-/10.5.5.5] 2015-03-15 02:40:25,596 OutboundTcpConnection.java (line 386) Handshaking version with /10.5.5.5 INFO [RequestResponseStage:3] 2015-03-15 02:40:25,700 Gossiper.java (line 876) InetAddress /10.2.2.2 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:25,701 FailureDetector.java (line 200) unknown endpoint /10.2.2.2 ERROR [MigrationStage:1] 2015-03-15 02:40:25,701 MigrationTask.java (line 55) Can't send migration request: node /10.2.2.2 is down. INFO [RequestResponseStage:4] 2015-03-15 02:40:25,716 Gossiper.java (line 876) InetAddress /10.1.1.1 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:25,716 FailureDetector.java (line 200) unknown endpoint /10.1.1.1 ERROR [MigrationStage:1] 2015-03-15 02:40:25,716 MigrationTask.java (line 55) Can't send migration request: node /10.1.1.1 is down. INFO [RequestResponseStage:1] 2015-03-15 02:40:25,719 Gossiper.java (line 876) InetAddress /10.3.3.3 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:25,720 FailureDetector.java (line 200) unknown endpoint /10.3.3.3 ERROR [MigrationStage:1] 2015-03-15 02:40:25,720 MigrationTask.java (line 55) Can't send migration request: node /10.3.3.3 is down. INFO [RequestResponseStage:2] 2015-03-15 02:40:25,739 Gossiper.java (line 876) InetAddress /10.4.4.4 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:25,739 FailureDetector.java (line 200) unknown endpoint /10.4.4.4 ERROR [MigrationStage:1] 2015-03-15 02:40:25,740 MigrationTask.java (line 55) Can't send migration request: node /10.4.4.4 is down. INFO [RequestResponseStage:3] 2015-03-15 02:40:25,742 Gossiper.java (line 876) InetAddress /10.30.30.30 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:25,743 FailureDetector.java (line 200) unknown endpoint /10.30.30.30 ERROR [MigrationStage:1] 2015-03-15 02:40:25,743 MigrationTask.java (line 55) Can't send migration request: node /10.30.30.30 is down. INFO [RequestResponseStage:4] 2015-03-15 02:40:25,747 Gossiper.java (line 876) InetAddress /10.20.20.20 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:25,747 FailureDetector.java (line 200) unknown endpoint /10.20.20.20 ERROR [MigrationStage:1] 2015-03-15 02:40:25,748 MigrationTask.java (line 55) Can't send migration request: node /10.20.20.20 is down. INFO [RequestResponseStage:1] 2015-03-15 02:40:25,823 Gossiper.java (line 876) InetAddress /10.5.5.5 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:25,824 FailureDetector.java (line 200) unknown endpoint /10.5.5.5 ERROR [MigrationStage:1] 2015-03-15 02:40:25,824 MigrationTask.java (line 55) Can't send migration request: node /10.5.5.5 is down. INFO [RequestResponseStage:2] 2015-03-15 02:40:25,825 Gossiper.java (line 876) InetAddress /10.10.10.10 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:25,825 FailureDetector.java (line 200) unknown endpoint /10.10.10.10 ERROR [MigrationStage:1] 2015-03-15 02:40:25,825 MigrationTask.java (line 55) Can't send migration request: node /10.10.10.10 is down. INFO [HANDSHAKE-/10.6.6.6] 2015-03-15 02:40:26,009 OutboundTcpConnection.java (line 386) Handshaking version with /10.6.6.6 INFO [RequestResponseStage:3] 2015-03-15 02:40:26,430 Gossiper.java (line 876) InetAddress /10.6.6.6 is now UP ERROR [MigrationStage:1] 2015-03-15 02:40:26,430 FailureDetector.java (line 200) unknown endpoint /10.6.6.6 ERROR [MigrationStage:1] 2015-03-15 02:40:26,430 MigrationTask.java (line 55) Can't send migration request: node /10.6.6.6 is down. INFO [GossipStage:1] 2015-03-15 02:40:28,934 Gossiper.java (line 910) Node /10.9.9.9 is now part of the cluster INFO [GossipStage:1] 2015-03-15 02:40:29,009 Gossiper.java (line 910) Node /10.1.1.1 is now part of the cluster INFO [HANDSHAKE-/10.1.1.1] 2015-03-15 02:40:29,010 OutboundTcpConnection.java (line 386) Handshaking version with /10.1.1.1 INFO [RequestResponseStage:4] 2015-03-15 02:40:29,018 Gossiper.java (line 876) InetAddress /10.1.1.1 is now UP INFO [GossipStage:1] 2015-03-15 02:40:29,084 Gossiper.java (line 910) Node /10.8.8.8 is now part of the cluster INFO [HANDSHAKE-/10.8.8.8] 2015-03-15 02:40:29,086 OutboundTcpConnection.java (line 386) Handshaking version with /10.8.8.8 INFO [HANDSHAKE-/10.1.1.1] 2015-03-15 02:40:29,087 OutboundTcpConnection.java (line 386) Handshaking version with /10.1.1.1 INFO [RequestResponseStage:1] 2015-03-15 02:40:29,113 Gossiper.java (line 876) InetAddress /10.8.8.8 is now UP INFO [GossipStage:1] 2015-03-15 02:40:29,159 Gossiper.java (line 910) Node /10.2.2.2 is now part of the cluster INFO [HANDSHAKE-/10.8.8.8] 2015-03-15 02:40:29,160 OutboundTcpConnection.java (line 386) Handshaking version with /10.8.8.8 INFO [HANDSHAKE-/10.2.2.2] 2015-03-15 02:40:29,161 OutboundTcpConnection.java (line 386) Handshaking version with /10.2.2.2 INFO [RequestResponseStage:2] 2015-03-15 02:40:29,166 Gossiper.java (line 876) InetAddress /10.2.2.2 is now UP INFO [GossipStage:1] 2015-03-15 02:40:29,213 Gossiper.java (line 910) Node /10.4.4.4 is now part of the cluster INFO [HANDSHAKE-/10.2.2.2] 2015-03-15 02:40:29,227 OutboundTcpConnection.java (line 386) Handshaking version with /10.2.2.2 INFO [HANDSHAKE-/10.4.4.4] 2015-03-15 02:40:29,243 OutboundTcpConnection.java (line 386) Handshaking version with /10.4.4.4 INFO [RequestResponseStage:3] 2015-03-15 02:40:29,252 Gossiper.java (line 876) InetAddress /10.4.4.4 is now UP INFO [GossipStage:1] 2015-03-15 02:40:29,265 Gossiper.java (line 910) Node /10.6.6.6 is now part of the cluster INFO [HANDSHAKE-/10.6.6.6] 2015-03-15 02:40:29,277 OutboundTcpConnection.java (line 386) Handshaking version with /10.6.6.6 INFO [RequestResponseStage:4] 2015-03-15 02:40:29,283 Gossiper.java (line 876) InetAddress /10.6.6.6 is now UP INFO [HANDSHAKE-/10.4.4.4] 2015-03-15 02:40:29,304 OutboundTcpConnection.java (line 386) Handshaking version with /10.4.4.4 INFO [GossipStage:1] 2015-03-15 02:40:29,314 Gossiper.java (line 910) Node /10.3.3.3 is now part of the cluster INFO [HANDSHAKE-/10.6.6.6] 2015-03-15 02:40:29,342 OutboundTcpConnection.java (line 386) Handshaking version with /10.6.6.6 INFO [HANDSHAKE-/10.3.3.3] 2015-03-15 02:40:29,342 OutboundTcpConnection.java (line 386) Handshaking version with /10.3.3.3 ERROR [MigrationStage:1] 2015-03-15 02:40:29,353 MigrationTask.java (line 55) Can't send migration request: node /10.3.3.3 is down. INFO [GossipStage:1] 2015-03-15 02:40:29,354 Gossiper.java (line 910) Node /10.30.30.30 is now part of the cluster INFO [HANDSHAKE-/10.30.30.30] 2015-03-15 02:40:29,371 OutboundTcpConnection.java (line 386) Handshaking version with /10.30.30.30 INFO [RequestResponseStage:1] 2015-03-15 02:40:29,379 Gossiper.java (line 876) InetAddress /10.30.30.30 is now UP INFO [GossipStage:1] 2015-03-15 02:40:29,387 Gossiper.java (line 910) Node /10.20.20.20 is now part of the cluster INFO [HANDSHAKE-/10.20.20.20] 2015-03-15 02:40:29,394 OutboundTcpConnection.java (line 386) Handshaking version with /10.20.20.20 INFO [HANDSHAKE-/10.30.30.30] 2015-03-15 02:40:29,395 OutboundTcpConnection.java (line 386) Handshaking version with /10.30.30.30 INFO [RequestResponseStage:2] 2015-03-15 02:40:29,403 Gossiper.java (line 876) InetAddress /10.20.20.20 is now UP INFO [InternalResponseStage:1] 2015-03-15 02:40:29,407 ColumnFamilyStore.java (line 794) Enqueuing flush of Memtable-schema_keyspaces@96619019(663/6630 serialized/live bytes, 16 ops) ... INFO [RequestResponseStage:3] 2015-03-15 02:40:38,647 Gossiper.java (line 876) InetAddress /10.3.3.3 is now UP INFO [RequestResponseStage:1] 2015-03-15 02:40:38,647 Gossiper.java (line 876) InetAddress /10.3.3.3 is now UP INFO [RequestResponseStage:4] 2015-03-15 02:40:38,647 Gossiper.java (line 876) InetAddress /10.3.3.3 is now UP INFO [RequestResponseStage:2] 2015-03-15 02:40:38,647 Gossiper.java (line 876) InetAddress /10.3.3.3 is now UP INFO [main] 2015-03-15 02:41:02,550 StorageService.java (line 966) JOINING: Starting to bootstrap... ERROR [main] 2015-03-15 02:41:02,872 CassandraDaemon.java (line 513) Exception encountered during startup java.lang.IllegalStateException: unable to find sufficient sources for streaming range (7169067280919608187,7171404468239785904] There appears to be a disconnect between what the Gossiper is logging (the node is up and endpoint state is saved) and what the MigrationStage is seeing. I can't seem to pinpoint the issue. The other nodes all show UN (Up/Normal) for the ring during the new node's bootstrap. Also this issue is reproducible in our cluster every time we attempt to add a new node (i.e. it is not intermittent or load dependent). We never had this problem before when we were on 1.2.11. Since then, we have 1. upgraded to 2.0.10, 2. migrated our hardware to new i2.xlarge instances (using rsync and the replace address system property) 3. shrunk our cluster down from 16 nodes to 9 (due to the much larger newer instances) I have searched around for this issue, and only really see a couple of references to it: - http://www.mail-archive.com/user%40cassandra.apache.org/msg34971.html - http://www.mail-archive.com/user%40cassandra.apache.org/msg38065.html Any help would be appreciated, Karl