Hi! Today I wanted a new node to join the cluster. When looking at netstats on all the old nodes, it seemed like the streaming sessions did complete. They all said that all files have been transferred. But looking at the debug.log the stream sessions finished with an error. Also after all streams have been done the node remains in state "JOINING".
See logs: debug.log, last words ======== ERROR [StreamReceiveTask:94] 2016-11-25 17:50:51,712 StreamSession.java:593 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Streaming error occurred on session with peer 10.23.71.6 org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses. at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:525) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:440) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.db.Mutation.apply(Mutation.java:223) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.db.Mutation.applyUnsafe(Mutation.java:242) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:205) ~[apache-cassandra-3.10.jar:3.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_102] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] DEBUG [STREAM-OUT-/10.23.71.6:7000] 2016-11-25 17:50:51,713 ConnectionHandler.java:388 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Sending Session Failed DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 StreamSession.java:472 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Finishing keep-alive task. DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,713 ConnectionHandler.java:120 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Closing stream connection handler on /10.23.71.6 INFO [StreamReceiveTask:94] 2016-11-25 17:50:51,719 StreamResultFuture.java:187 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Session with /10.23.71.6 is complete DEBUG [StreamReceiveTask:94] 2016-11-25 17:50:51,719 StreamCoordinator.java:146 - Finished connecting all sessions WARN [StreamReceiveTask:94] 2016-11-25 17:50:51,723 StreamResultFuture.java:214 - [Stream #b998aec0-b2fd-11e6-a63d-75828fa8d45c] Stream failed ERROR [main] 2016-11-25 17:50:51,724 StorageService.java:1493 - Error while waiting on bootstrap to complete. Bootstrap will have to be restarted. java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1488) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:948) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:667) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:598) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:394) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:735) [apache-cassandra-3.10.jar:3.10] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571) ~[apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:251) ~[apache-cassandra-3.10.jar:3.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_102] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_102] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102] WARN [StreamReceiveTask:94] 2016-11-25 17:50:51,731 StorageService.java:1483 - Error during bootstrap. org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88) ~[apache-cassandra-3.10.jar:3.10] at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) [guava-18.0.jar:na] at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) [guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-18.0.jar:na] at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:215) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:481) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:571) [apache-cassandra-3.10.jar:3.10] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:251) [apache-cassandra-3.10.jar:3.10] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_102] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_102] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_102] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] netstat output of 10.23.71.6, the node mentioned above in the debug.log, obviously all files + bytes have been transferred ========== Mode: NORMAL Bootstrap b998aec0-b2fd-11e6-a63d-75828fa8d45c /10.23.71.8 Sending 1598 files, 60610896516 bytes total. Already sent 1598 files, 60610896516 bytes total /var/lib/cassandra/data/log/log_fake-b130c05070e611e6986e29a4f0eae2e7/mc-97218-big-Data.db 29425392/29425392 bytes(100%) sent to idx:0/10.23.71.8 ... nodetool info on new node says ========== ID : 9dedcc9a-d951-4c7a-b794-434db1af960f Gossip active : true Thrift active : true Native Transport active: true Load : 185.2 GiB Generation No : 1480071319 Uptime (seconds) : 25082 Heap Memory (MB) : 3607.60 / 15974.44 Off Heap Memory (MB) : 405.70 Data Center : DC1 Rack : RAC1 Exceptions : 0 Key Cache : entries 1312994, size 100 MiB, capacity 100 MiB, 126413327 hits, 162672698 requests, 0.777 recent hit rate, 14400 save period in seconds Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Chunk Cache : entries 7680, size 480 MiB, capacity 480 MiB, 8277584 misses, 147262566 requests, 0.944 recent hit rate, 835.412 microseconds miss latency Percent Repaired : 4.232661592656687% Token : (node is not joined to the cluster) Any idea whats going wrong? Same situation was when I bootstrapped a node last time. In the end I just started the node with auto_bootstrap=false to get it up and running and I ran repair afterwards. I'd like to avoid that repair and all the inconsistencies this time. -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer