Re: cassandra node stops streaming data during nodetool rebuild

Roland Otta Fri, 07 Apr 2017 06:23:46 -0700

good point!

on the source side i can see the following error


ERROR [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamSession.java:529 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Streaming error occurred on session with peer 10.192.116.1 through 192.168.
0.114
org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
        at 
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:145) 
~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.streaming.compress.CompressedStreamWriter.lambda$write$0(CompressedStreamWriter.java:90)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.applyToChannel(BufferedDataOutputStreamPlus.java:350)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:90)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:91)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:48)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:370)
 ~[apache-cassandra-3.7.jar:3.7]
        at 
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
 ~[apache-cassandra-3.7.jar:3.7]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_77]
        at 
sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428) 
~[na:1.8.0_77]
        at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493) 
~[na:1.8.0_77]
        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608) 
~[na:1.8.0_77]
        at 
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:141) 
~[apache-cassandra-3.7.jar:3.7]
        ... 10 common frames omitted
DEBUG [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
ConnectionHandler.java:110 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Closing stream connection handler on /10.192.116.1
INFO  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamResultFuture.java:187 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Session with /10.192.116.1 is complete
WARN  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamResultFuture.java:214 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Stream failed


the dataset is approx 300GB / Node.

does that mean that cassandra does not try to reconnect (for streaming) in case 
of short network dropouts?

On Fri, 2017-04-07 at 08:53 -0400, Jacob Shadix wrote:
Did you look at the logs on the source DC as well? How big is the dataset?

-- Jacob Shadix

On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>> wrote:
Hi!

we are on 3.7.

we have some debug messages ... but i guess they are not related to that issue
DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 - 
Ignoring interval time of 2002469610 for /192.168.0.27<http://192.168.0.27>
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2598593732 for /10.192.116.4<http://10.192.116.4>
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002612298 for /10.192.116.5<http://10.192.116.5>
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002660534 for /10.192.116.9<http://10.192.116.9>
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027212880 for /10.192.116.3<http://10.192.116.3>
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027279042 for /192.168.0.188<http://192.168.0.188>
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027313992 for /10.192.116.10<http://10.192.116.10>

beside that the debug.log is clean

all the mentioned cassandra.yml parameters are the shipped defaults 
(streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml)
i also checked the pending compactions. there are no pending compactions at the 
moment.

bg - roland otta

On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote:
What version are you running? Do you see any errors in the system.log 
(SocketTimeout, for instance)?

And what values do you have for the following in cassandra.yaml:
- - stream_throughput_outbound_megabits_per_sec
- - compaction_throughput_mb_per_sec
- - streaming_socket_timeout_in_ms

-- Jacob Shadix

On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta 
<roland.o...@willhaben.at<mailto:roland.o...@willhaben.at>> wrote:
hi,

we are trying to setup a new datacenter and are initalizing the data
with nodetool rebuild.

after some hours it seems that the node stopped streaming (at least
there is no more streaming traffic on the network interface).

nodetool netstats shows that the streaming is still in progress

Mode: NORMAL
Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
    /192.168.0.26<http://192.168.0.26>
        Receiving 257 files, 145444246572 bytes total. Already received
1 files, 1744027 bytes total
            bds/adcounter_total 76456/47310255 bytes(0%) received from
idx:0/192.168.0.26<http://192.168.0.26>
            bds/upselling_event 1667571/1667571 bytes(100%) received
from idx:0/192.168.0.26<http://192.168.0.26>
    /192.168.0.188<http://192.168.0.188>
    /192.168.0.27<http://192.168.0.27>
        Receiving 169 files, 79355302464 bytes total. Already received
1 files, 81585975 bytes total
            bds/ad_event_history 81585975/81585975 bytes(100%) received
from idx:0/192.168.0.27<http://192.168.0.27>
    /192.168.0.189<http://192.168.0.189>
        Receiving 140 files, 19673034809 bytes total. Already received
1 files, 5996604 bytes total
            bds/adcounter_per_day 5956840/42259846 bytes(14%) received
from idx:0/192.168.0.189<http://192.168.0.189>
            bds/user_event 39764/39764 bytes(100%) received from
idx:0/192.168.0.189<http://192.168.0.189>
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed   Dropped
Large messages                  n/a         2              3         0
Small messages                  n/a         0       68632465         0
Gossip messages                 n/a         0         217661         0



it is in that state for approx 15 hours now

does it make sense waiting for the streaming to finish or do i have to
restart the node, discard data and restart the rebuild?

Re: cassandra node stops streaming data during nodetool rebuild

Reply via email to