good point! on the source side i can see the following error
ERROR [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 StreamSession.java:529 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] Streaming error occurred on session with peer 10.192.116.1 through 192.168. 0.114 org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:145) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.streaming.compress.CompressedStreamWriter.lambda$write$0(CompressedStreamWriter.java:90) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.applyToChannel(BufferedDataOutputStreamPlus.java:350) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:90) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:91) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:48) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:370) ~[apache-cassandra-3.7.jar:3.7] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342) ~[apache-cassandra-3.7.jar:3.7] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_77] at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428) ~[na:1.8.0_77] at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493) ~[na:1.8.0_77] at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608) ~[na:1.8.0_77] at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:141) ~[apache-cassandra-3.7.jar:3.7] ... 10 common frames omitted DEBUG [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 ConnectionHandler.java:110 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] Closing stream connection handler on /10.192.116.1 INFO [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 StreamResultFuture.java:187 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] Session with /10.192.116.1 is complete WARN [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 StreamResultFuture.java:214 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] Stream failed the dataset is approx 300GB / Node. does that mean that cassandra does not try to reconnect (for streaming) in case of short network dropouts? On Fri, 2017-04-07 at 08:53 -0400, Jacob Shadix wrote: Did you look at the logs on the source DC as well? How big is the dataset? -- Jacob Shadix On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>> wrote: Hi! we are on 3.7. we have some debug messages ... but i guess they are not related to that issue DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 - Ignoring interval time of 2002469610 for /192.168.0.27<http://192.168.0.27> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - Ignoring interval time of 2598593732 for /10.192.116.4<http://10.192.116.4> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - Ignoring interval time of 2002612298 for /10.192.116.5<http://10.192.116.5> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - Ignoring interval time of 2002660534 for /10.192.116.9<http://10.192.116.9> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - Ignoring interval time of 2027212880 for /10.192.116.3<http://10.192.116.3> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - Ignoring interval time of 2027279042 for /192.168.0.188<http://192.168.0.188> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - Ignoring interval time of 2027313992 for /10.192.116.10<http://10.192.116.10> beside that the debug.log is clean all the mentioned cassandra.yml parameters are the shipped defaults (streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml) i also checked the pending compactions. there are no pending compactions at the moment. bg - roland otta On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote: What version are you running? Do you see any errors in the system.log (SocketTimeout, for instance)? And what values do you have for the following in cassandra.yaml: - - stream_throughput_outbound_megabits_per_sec - - compaction_throughput_mb_per_sec - - streaming_socket_timeout_in_ms -- Jacob Shadix On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta <roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>> wrote: hi, we are trying to setup a new datacenter and are initalizing the data with nodetool rebuild. after some hours it seems that the node stopped streaming (at least there is no more streaming traffic on the network interface). nodetool netstats shows that the streaming is still in progress Mode: NORMAL Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9 Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9 /192.168.0.26<http://192.168.0.26> Receiving 257 files, 145444246572 bytes total. Already received 1 files, 1744027 bytes total bds/adcounter_total 76456/47310255 bytes(0%) received from idx:0/192.168.0.26<http://192.168.0.26> bds/upselling_event 1667571/1667571 bytes(100%) received from idx:0/192.168.0.26<http://192.168.0.26> /192.168.0.188<http://192.168.0.188> /192.168.0.27<http://192.168.0.27> Receiving 169 files, 79355302464 bytes total. Already received 1 files, 81585975 bytes total bds/ad_event_history 81585975/81585975 bytes(100%) received from idx:0/192.168.0.27<http://192.168.0.27> /192.168.0.189<http://192.168.0.189> Receiving 140 files, 19673034809 bytes total. Already received 1 files, 5996604 bytes total bds/adcounter_per_day 5956840/42259846 bytes(14%) received from idx:0/192.168.0.189<http://192.168.0.189> bds/user_event 39764/39764 bytes(100%) received from idx:0/192.168.0.189<http://192.168.0.189> Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Large messages n/a 2 3 0 Small messages n/a 0 68632465 0 Gossip messages n/a 0 217661 0 it is in that state for approx 15 hours now does it make sense waiting for the streaming to finish or do i have to restart the node, discard data and restart the rebuild?