[ https://issues.apache.org/jira/browse/CASSANDRA-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Heiko Sommer updated CASSANDRA-10693: ------------------------------------- Description: (This is the original problem description. Later the title of this ticket was changed and new findings were added as a comment.) I'm playing with a test cluster, trying to add a 3rd server while writing to the cluster at max rate (executeAsync with pushing back). Single table for time series data, with RF=2, STCS. The bootstrap failure that I repeatedly get may be the same as in CASSANDRA-9440 (or also CASSANDRA-6565). Even if I could join the 3rd node using temporarily modified configuation ({{streaming_socket_timeout_in_ms}}, {{phi_convict_threshold}} etc), I think that at least improving the logs for this failure case could be worthwhile. When starting the new node (134.171.18.228), the underlying problem seems to happen very early in the bootstrapping, on one of the existing cluster nodes (134.171.18.231): {noformat} INFO [STREAM-INIT-/134.171.18.228:45331] 2015-11-11 16:21:30,344 StreamResultFuture.java:109 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc ID#0] Creating new streaming plan for Bootstrap INFO [STREAM-INIT-/134.171.18.228:45331] 2015-11-11 16:21:30,344 StreamResultFuture.java:116 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Received streaming plan for Bootstrap INFO [STREAM-INIT-/134.171.18.228:45333] 2015-11-11 16:21:30,350 StreamResultFuture.java:116 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Received streaming plan for Bootstrap ERROR [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,505 StreamSession.java:505 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Streaming error occurred java.lang.AssertionError: Illegal bounds [2270144..2270152); size: 2270144 at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:328) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:98) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:253) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.messages.OutgoingFileMessage.<init>(OutgoingFileMessage.java:70) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamTransferTask.addTransferFile(StreamTransferTask.java:58) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamSession.addTransferFiles(StreamSession.java:378) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:272) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:521) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:453) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] INFO [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,510 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.228 is complete WARN [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,511 StreamResultFuture.java:207 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Stream failed {noformat} On the joining node, this failure is not reported at all: {noformat} INFO [main] 2015-11-11 16:21:30,000 StorageService.java:1120 - JOINING: Starting to bootstrap... INFO [main] 2015-11-11 16:21:30,336 StreamResultFuture.java:86 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Executing streaming plan for Bootstrap INFO [StreamConnectionEstablisher:2] 2015-11-11 16:21:30,337 StreamSession.java:220 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Starting streaming to /134.171.18.229 INFO [StreamConnectionEstablisher:1] 2015-11-11 16:21:30,337 StreamSession.java:220 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Starting streaming to /134.171.18.231 INFO [StreamConnectionEstablisher:1] 2015-11-11 16:21:30,348 StreamCoordinator.java:209 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Beginning stream session with /134.171.18.231 INFO [StreamConnectionEstablisher:2] 2015-11-11 16:21:30,348 StreamCoordinator.java:209 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Beginning stream session with /134.171.18.229 INFO [STREAM-IN-/134.171.18.231] 2015-11-11 16:21:41,508 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.231 is complete INFO [STREAM-IN-/134.171.18.229] 2015-11-11 16:21:42,639 StreamResultFuture.java:166 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc ID#0] Prepare completed. Receiving 66 files(120500434815 bytes), sending 0 files(0 bytes) {noformat} However, ~9 hours later, the bootstrapping fails, with logs: {noformat} INFO [StreamReceiveTask:8] 2015-11-12 01:25:26,837 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.229 is complete WARN [StreamReceiveTask:8] 2015-11-12 01:25:26,839 StreamResultFuture.java:207 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Stream failed ERROR [main] 2015-11-12 01:25:26,843 CassandraDaemon.java:579 - Exception encountered during startup java.lang.RuntimeException: Error during boostrap: Stream failed at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1148) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:938) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:734) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:611) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:688) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:651) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:139) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40] WARN [StorageServiceShutdownHook] 2015-11-12 01:25:26,855 Gossiper.java:1453 - No local state or state is in silent shutdown, not announcing shutdown {noformat} I hacked {{StreamState#hasFailedSession()}} to log all failed sessions, and it showed exactly one to the node that had the "AssertionError: Illegal bounds" before. It would be great to get some help with the following questions: * What could cause the "AssertionError: Illegal bounds"? I'm now running a cleanup just in case and will re-run later. * Is the streaming failure with one of the existing cluster nodes considered so unimportant (because other nodes are supposed to cover this) that it should not be logged on the bootstrapping node? In that case the failure should not let the bootstrapping fail, I suppose. * Is it normal that bootstrapping can take 9 hours, or am I doing something fundamentally wrong? * OpsCenter 5.2.2 shows my new node for the whole time under "Unknown Datacenter", even though "nodetool status" shows it as joining the existing datacenter. Is that an indication of a problem in my setup, or just the way that OpsCenter displays joining nodes? was: I'm playing with a test cluster, trying to add a 3rd server while writing to the cluster at max rate (executeAsync with pushing back). Single table for time series data, with RF=2, STCS. The bootstrap failure that I repeatedly get may be the same as in CASSANDRA-9440 (or also CASSANDRA-6565). Even if I could join the 3rd node using temporarily modified configuation ({{streaming_socket_timeout_in_ms}}, {{phi_convict_threshold}} etc), I think that at least improving the logs for this failure case could be worthwhile. When starting the new node (134.171.18.228), the underlying problem seems to happen very early in the bootstrapping, on one of the existing cluster nodes (134.171.18.231): {noformat} INFO [STREAM-INIT-/134.171.18.228:45331] 2015-11-11 16:21:30,344 StreamResultFuture.java:109 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc ID#0] Creating new streaming plan for Bootstrap INFO [STREAM-INIT-/134.171.18.228:45331] 2015-11-11 16:21:30,344 StreamResultFuture.java:116 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Received streaming plan for Bootstrap INFO [STREAM-INIT-/134.171.18.228:45333] 2015-11-11 16:21:30,350 StreamResultFuture.java:116 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Received streaming plan for Bootstrap ERROR [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,505 StreamSession.java:505 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Streaming error occurred java.lang.AssertionError: Illegal bounds [2270144..2270152); size: 2270144 at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:328) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:98) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:253) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.messages.OutgoingFileMessage.<init>(OutgoingFileMessage.java:70) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamTransferTask.addTransferFile(StreamTransferTask.java:58) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamSession.addTransferFiles(StreamSession.java:378) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:272) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:521) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:453) ~[apache-cassandra-2.1.11.jar:2.1.11] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] INFO [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,510 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.228 is complete WARN [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,511 StreamResultFuture.java:207 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Stream failed {noformat} On the joining node, this failure is not reported at all: {noformat} INFO [main] 2015-11-11 16:21:30,000 StorageService.java:1120 - JOINING: Starting to bootstrap... INFO [main] 2015-11-11 16:21:30,336 StreamResultFuture.java:86 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Executing streaming plan for Bootstrap INFO [StreamConnectionEstablisher:2] 2015-11-11 16:21:30,337 StreamSession.java:220 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Starting streaming to /134.171.18.229 INFO [StreamConnectionEstablisher:1] 2015-11-11 16:21:30,337 StreamSession.java:220 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Starting streaming to /134.171.18.231 INFO [StreamConnectionEstablisher:1] 2015-11-11 16:21:30,348 StreamCoordinator.java:209 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Beginning stream session with /134.171.18.231 INFO [StreamConnectionEstablisher:2] 2015-11-11 16:21:30,348 StreamCoordinator.java:209 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Beginning stream session with /134.171.18.229 INFO [STREAM-IN-/134.171.18.231] 2015-11-11 16:21:41,508 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.231 is complete INFO [STREAM-IN-/134.171.18.229] 2015-11-11 16:21:42,639 StreamResultFuture.java:166 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc ID#0] Prepare completed. Receiving 66 files(120500434815 bytes), sending 0 files(0 bytes) {noformat} However, ~9 hours later, the bootstrapping fails, with logs: {noformat} INFO [StreamReceiveTask:8] 2015-11-12 01:25:26,837 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.229 is complete WARN [StreamReceiveTask:8] 2015-11-12 01:25:26,839 StreamResultFuture.java:207 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Stream failed ERROR [main] 2015-11-12 01:25:26,843 CassandraDaemon.java:579 - Exception encountered during startup java.lang.RuntimeException: Error during boostrap: Stream failed at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1148) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:938) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:734) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:611) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:688) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:651) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:139) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_40] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40] WARN [StorageServiceShutdownHook] 2015-11-12 01:25:26,855 Gossiper.java:1453 - No local state or state is in silent shutdown, not announcing shutdown {noformat} I hacked {{StreamState#hasFailedSession()}} to log all failed sessions, and it showed exactly one to the node that had the "AssertionError: Illegal bounds" before. It would be great to get some help with the following questions: * What could cause the "AssertionError: Illegal bounds"? I'm now running a cleanup just in case and will re-run later. * Is the streaming failure with one of the existing cluster nodes considered so unimportant (because other nodes are supposed to cover this) that it should not be logged on the bootstrapping node? In that case the failure should not let the bootstrapping fail, I suppose. * Is it normal that bootstrapping can take 9 hours, or am I doing something fundamentally wrong? * OpsCenter 5.2.2 shows my new node for the whole time under "Unknown Datacenter", even though "nodetool status" shows it as joining the existing datacenter. Is that an indication of a problem in my setup, or just the way that OpsCenter displays joining nodes? Summary: Bootstrapping needs improved error handling for disk failure on existing node (was: Bootstrapping failure that at least needs improved logging) I'm changing the title of this ticket from "Bootstrapping failure that at least needs improved logging" to "Bootstrapping needs improved error handling for disk failure on existing node" because I now understand better where the problem came from. One of the JBOD disks on the existing node 134.171.18.231 had problems and produced read errors on various sstable files. The disk errors must have caused the "AssertionError: Illegal bounds" exception seen on the existing node early in the streaming of data to the joining node. For some reason Cassandra did not stop the entire node though, in spite of CorruptSSTableException logged on other occasions. After replacing that disk and running a full repair, the new node 134.171.18.228 bootstrapped successfully. I can answer my other questions from before: It is apparently normal for bootstrapping to run over many hours, without running into timeout problems. OpsCenter shows the joining node as "Unknown Datacenter" for the whole time, even though in the help message this is mentioned only as a short state lasting a few minutes. Once the node has joined, OpsCenter shows it correctly as part of its datacenter, and includes it in the performance metrics display. Still I think that Cassandra's error handling should be improved: * With RF=2, bootstrapping should have succeeded even if one of the two existing nodes had disk read errors. Is it true that currently the joining node lets it fail in the end if any stream had failed, without trying to get the data from another node? * If for some reason (Write CL=1) the missing data was not available from the other node, then bootstrapping should have failed right away, not only after 9 hours. * In either case, a log at INFO level or above on the joining node would have been nice. > Bootstrapping needs improved error handling for disk failure on existing node > ----------------------------------------------------------------------------- > > Key: CASSANDRA-10693 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10693 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: Cassandra 2.1.11 > Cluster of 2 or 3 xeon servers (8 core, 24 GB RAM each) > Reporter: Heiko Sommer > > (This is the original problem description. Later the title of this ticket was > changed and new findings were added as a comment.) > I'm playing with a test cluster, trying to add a 3rd server while writing to > the cluster at max rate (executeAsync with pushing back). Single table for > time series data, with RF=2, STCS. > The bootstrap failure that I repeatedly get may be the same as in > CASSANDRA-9440 (or also CASSANDRA-6565). > Even if I could join the 3rd node using temporarily modified configuation > ({{streaming_socket_timeout_in_ms}}, {{phi_convict_threshold}} etc), I think > that at least improving the logs for this failure case could be worthwhile. > When starting the new node (134.171.18.228), the underlying problem seems to > happen very early in the bootstrapping, on one of the existing cluster nodes > (134.171.18.231): > {noformat} > INFO [STREAM-INIT-/134.171.18.228:45331] 2015-11-11 16:21:30,344 > StreamResultFuture.java:109 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc > ID#0] Creating new streaming plan for Bootstrap > INFO [STREAM-INIT-/134.171.18.228:45331] 2015-11-11 16:21:30,344 > StreamResultFuture.java:116 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, > ID#0] Received streaming plan for Bootstrap > INFO [STREAM-INIT-/134.171.18.228:45333] 2015-11-11 16:21:30,350 > StreamResultFuture.java:116 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, > ID#0] Received streaming plan for Bootstrap > ERROR [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,505 > StreamSession.java:505 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] > Streaming error occurred > java.lang.AssertionError: Illegal bounds [2270144..2270152); size: 2270144 > at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:328) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:98) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:253) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage.<init>(OutgoingFileMessage.java:70) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.streaming.StreamTransferTask.addTransferFile(StreamTransferTask.java:58) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.streaming.StreamSession.addTransferFiles(StreamSession.java:378) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:272) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:521) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:453) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] > INFO [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,510 > StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] > Session with /134.171.18.228 is complete > WARN [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,511 > StreamResultFuture.java:207 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] > Stream failed > {noformat} > On the joining node, this failure is not reported at all: > {noformat} > INFO [main] 2015-11-11 16:21:30,000 StorageService.java:1120 - JOINING: > Starting to bootstrap... > INFO [main] 2015-11-11 16:21:30,336 StreamResultFuture.java:86 - [Stream > #43deef20-8890-11e5-a633-c16ac7efddbc] Executing streaming plan for Bootstrap > INFO [StreamConnectionEstablisher:2] 2015-11-11 16:21:30,337 > StreamSession.java:220 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] > Starting streaming to /134.171.18.229 > INFO [StreamConnectionEstablisher:1] 2015-11-11 16:21:30,337 > StreamSession.java:220 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] > Starting streaming to /134.171.18.231 > INFO [StreamConnectionEstablisher:1] 2015-11-11 16:21:30,348 > StreamCoordinator.java:209 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, > ID#0] Beginning stream session with /134.171.18.231 > INFO [StreamConnectionEstablisher:2] 2015-11-11 16:21:30,348 > StreamCoordinator.java:209 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, > ID#0] Beginning stream session with /134.171.18.229 > INFO [STREAM-IN-/134.171.18.231] 2015-11-11 16:21:41,508 > StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] > Session with /134.171.18.231 is complete > INFO [STREAM-IN-/134.171.18.229] 2015-11-11 16:21:42,639 > StreamResultFuture.java:166 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc > ID#0] Prepare completed. Receiving 66 files(120500434815 bytes), sending 0 > files(0 bytes) > {noformat} > However, ~9 hours later, the bootstrapping fails, with logs: > {noformat} > INFO [StreamReceiveTask:8] 2015-11-12 01:25:26,837 > StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] > Session with /134.171.18.229 is complete > WARN [StreamReceiveTask:8] 2015-11-12 01:25:26,839 > StreamResultFuture.java:207 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] > Stream failed > ERROR [main] 2015-11-12 01:25:26,843 CassandraDaemon.java:579 - Exception > encountered during startup > java.lang.RuntimeException: Error during boostrap: Stream failed > at > org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1148) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:938) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:734) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:611) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387) > [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562) > [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) > [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:688) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:651) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:139) > ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_40] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_40] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40] > WARN [StorageServiceShutdownHook] 2015-11-12 01:25:26,855 Gossiper.java:1453 > - No local state or state is in silent shutdown, not announcing shutdown > {noformat} > I hacked {{StreamState#hasFailedSession()}} to log all failed sessions, and > it showed exactly one to the node that had the "AssertionError: Illegal > bounds" before. > It would be great to get some help with the following questions: > * What could cause the "AssertionError: Illegal bounds"? I'm now running a > cleanup just in case and will re-run later. > * Is the streaming failure with one of the existing cluster nodes considered > so unimportant (because other nodes are supposed to cover this) that it > should not be logged on the bootstrapping node? In that case the failure > should not let the bootstrapping fail, I suppose. > * Is it normal that bootstrapping can take 9 hours, or am I doing something > fundamentally wrong? > * OpsCenter 5.2.2 shows my new node for the whole time under "Unknown > Datacenter", even though "nodetool status" shows it as joining the existing > datacenter. Is that an indication of a problem in my setup, or just the way > that OpsCenter displays joining nodes? -- This message was sent by Atlassian JIRA (v6.3.4#6332)