Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Jacob Shadix
I don't see an issue with the size of the data / node. You can attempt the
rebuild again and play around with throughput if your network can handle it.

It can be changed on-the-fly with nodetool:

 nodetool setstreamthroughput

This article is also worth a read -
https://support.datastax.com/hc/en-us/articles/205409646-How-to-performance-tune-data-streaming-activities-like-repair-and-bootstrap

-- Jacob Shadix

On Fri, Apr 7, 2017 at 9:23 AM, Roland Otta 
wrote:

> good point!
>
> on the source side i can see the following error
>
> ERROR [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamSession.java:529 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Streaming error occurred on session with peer 10.192.116.1 through 192.168.
> 0.114
> org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
> at 
> org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:145)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.compress.
> CompressedStreamWriter.lambda$write$0(CompressedStreamWriter.java:90)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.
> applyToChannel(BufferedDataOutputStreamPlus.java:350)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.compress.
> CompressedStreamWriter.write(CompressedStreamWriter.java:90)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.messages.
> OutgoingFileMessage.serialize(OutgoingFileMessage.java:91)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
> serialize(OutgoingFileMessage.java:48) ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
> serialize(OutgoingFileMessage.java:40) ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.messages.
> StreamMessage.serialize(StreamMessage.java:48)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.ConnectionHandler$
> OutgoingMessageHandler.sendMessage(ConnectionHandler.java:370)
> ~[apache-cassandra-3.7.jar:3.7]
> at org.apache.cassandra.streaming.ConnectionHandler$
> OutgoingMessageHandler.run(ConnectionHandler.java:342)
> ~[apache-cassandra-3.7.jar:3.7]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Caused by: java.io.IOException: Broken pipe
> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> ~[na:1.8.0_77]
> at 
> sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
> ~[na:1.8.0_77]
> at 
> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
> ~[na:1.8.0_77]
> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
> ~[na:1.8.0_77]
> at 
> org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:141)
> ~[apache-cassandra-3.7.jar:3.7]
> ... 10 common frames omitted
> DEBUG [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> ConnectionHandler.java:110 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Closing stream connection handler on /10.192.116.1
> INFO  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamResultFuture.java:187 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Session with /10.192.116.1 is complete
> WARN  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamResultFuture.java:214 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Stream failed
>
>
> the dataset is approx 300GB / Node.
>
> does that mean that cassandra does not try to reconnect (for streaming) in
> case of short network dropouts?
>
> On Fri, 2017-04-07 at 08:53 -0400, Jacob Shadix wrote:
>
> Did you look at the logs on the source DC as well? How big is the dataset?
>
> -- Jacob Shadix
>
> On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta 
> wrote:
>
> Hi!
>
> we are on 3.7.
>
> we have some debug messages ... but i guess they are not related to that
> issue
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 -
> Ignoring interval time of 2002469610 for /192.168.0.27
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2598593732 for /10.192.116.4
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002612298 for /10.192.116.5
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002660534 for /10.192.116.9
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027212880 for /10.192.116.3
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027279042 for /192.168.0.188
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027313992 for 

Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Roland Otta
good point!

on the source side i can see the following error

ERROR [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamSession.java:529 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Streaming error occurred on session with peer 10.192.116.1 through 192.168.
0.114
org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
at 
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:145) 
~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.compress.CompressedStreamWriter.lambda$write$0(CompressedStreamWriter.java:90)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.applyToChannel(BufferedDataOutputStreamPlus.java:350)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:90)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:91)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:48)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:40)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:48)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:370)
 ~[apache-cassandra-3.7.jar:3.7]
at 
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
 ~[apache-cassandra-3.7.jar:3.7]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.8.0_77]
at 
sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428) 
~[na:1.8.0_77]
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493) 
~[na:1.8.0_77]
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608) 
~[na:1.8.0_77]
at 
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:141) 
~[apache-cassandra-3.7.jar:3.7]
... 10 common frames omitted
DEBUG [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
ConnectionHandler.java:110 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Closing stream connection handler on /10.192.116.1
INFO  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamResultFuture.java:187 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Session with /10.192.116.1 is complete
WARN  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532 
StreamResultFuture.java:214 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9] 
Stream failed


the dataset is approx 300GB / Node.

does that mean that cassandra does not try to reconnect (for streaming) in case 
of short network dropouts?

On Fri, 2017-04-07 at 08:53 -0400, Jacob Shadix wrote:
Did you look at the logs on the source DC as well? How big is the dataset?

-- Jacob Shadix

On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta 
> wrote:
Hi!

we are on 3.7.

we have some debug messages ... but i guess they are not related to that issue
DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 - 
Ignoring interval time of 2002469610 for /192.168.0.27
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2598593732 for /10.192.116.4
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002612298 for /10.192.116.5
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002660534 for /10.192.116.9
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027212880 for /10.192.116.3
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027279042 for /192.168.0.188
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027313992 for /10.192.116.10

beside that the debug.log is clean

all the mentioned cassandra.yml parameters are the shipped defaults 
(streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml)
i also checked the pending compactions. there are no pending compactions at the 
moment.

bg - roland otta

On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote:
What version are you running? Do you see any errors in the system.log 
(SocketTimeout, for instance)?

And 

Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Jacob Shadix
Did you look at the logs on the source DC as well? How big is the dataset?

-- Jacob Shadix

On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta 
wrote:

> Hi!
>
> we are on 3.7.
>
> we have some debug messages ... but i guess they are not related to that
> issue
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 -
> Ignoring interval time of 2002469610 for /192.168.0.27
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2598593732 for /10.192.116.4
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002612298 for /10.192.116.5
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002660534 for /10.192.116.9
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027212880 for /10.192.116.3
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027279042 for /192.168.0.188
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027313992 for /10.192.116.10
>
> beside that the debug.log is clean
>
> all the mentioned cassandra.yml parameters are the shipped defaults (
> streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml)
> i also checked the pending compactions. there are no pending compactions
> at the moment.
>
> bg - roland otta
>
> On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote:
>
> What version are you running? Do you see any errors in the system.log
> (SocketTimeout, for instance)?
>
> And what values do you have for the following in cassandra.yaml:
> - - stream_throughput_outbound_megabits_per_sec
> - - compaction_throughput_mb_per_sec
> - - streaming_socket_timeout_in_ms
>
> -- Jacob Shadix
>
> On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta 
> wrote:
>
> hi,
>
> we are trying to setup a new datacenter and are initalizing the data
> with nodetool rebuild.
>
> after some hours it seems that the node stopped streaming (at least
> there is no more streaming traffic on the network interface).
>
> nodetool netstats shows that the streaming is still in progress
>
> Mode: NORMAL
> Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
> Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
> /192.168.0.26
> Receiving 257 files, 145444246572 bytes total. Already received
> 1 files, 1744027 bytes total
> bds/adcounter_total 76456/47310255 bytes(0%) received from
> idx:0/192.168.0.26
> bds/upselling_event 1667571/1667571 bytes(100%) received
> from idx:0/192.168.0.26
> /192.168.0.188
> /192.168.0.27
> Receiving 169 files, 79355302464 bytes total. Already received
> 1 files, 81585975 bytes total
> bds/ad_event_history 81585975/81585975 bytes(100%) received
> from idx:0/192.168.0.27
> /192.168.0.189
> Receiving 140 files, 19673034809 bytes total. Already received
> 1 files, 5996604 bytes total
> bds/adcounter_per_day 5956840/42259846 bytes(14%) received
> from idx:0/192.168.0.189
> bds/user_event 39764/39764 bytes(100%) received from
> idx:0/192.168.0.189
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 2  3 0
> Small messages  n/a 0   68632465 0
> Gossip messages n/a 0 217661 0
>
>
>
> it is in that state for approx 15 hours now
>
> does it make sense waiting for the streaming to finish or do i have to
> restart the node, discard data and restart the rebuild?
>
>
>


Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Roland Otta
Hi!

we are on 3.7.

we have some debug messages ... but i guess they are not related to that issue
DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 - 
Ignoring interval time of 2002469610 for /192.168.0.27
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2598593732 for /10.192.116.4
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002612298 for /10.192.116.5
DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 - 
Ignoring interval time of 2002660534 for /10.192.116.9
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027212880 for /10.192.116.3
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027279042 for /192.168.0.188
DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 - 
Ignoring interval time of 2027313992 for /10.192.116.10

beside that the debug.log is clean

all the mentioned cassandra.yml parameters are the shipped defaults 
(streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml)
i also checked the pending compactions. there are no pending compactions at the 
moment.

bg - roland otta

On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote:
What version are you running? Do you see any errors in the system.log 
(SocketTimeout, for instance)?

And what values do you have for the following in cassandra.yaml:
- - stream_throughput_outbound_megabits_per_sec
- - compaction_throughput_mb_per_sec
- - streaming_socket_timeout_in_ms

-- Jacob Shadix

On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta 
> wrote:
hi,

we are trying to setup a new datacenter and are initalizing the data
with nodetool rebuild.

after some hours it seems that the node stopped streaming (at least
there is no more streaming traffic on the network interface).

nodetool netstats shows that the streaming is still in progress

Mode: NORMAL
Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
/192.168.0.26
Receiving 257 files, 145444246572 bytes total. Already received
1 files, 1744027 bytes total
bds/adcounter_total 76456/47310255 bytes(0%) received from
idx:0/192.168.0.26
bds/upselling_event 1667571/1667571 bytes(100%) received
from idx:0/192.168.0.26
/192.168.0.188
/192.168.0.27
Receiving 169 files, 79355302464 bytes total. Already received
1 files, 81585975 bytes total
bds/ad_event_history 81585975/81585975 bytes(100%) received
from idx:0/192.168.0.27
/192.168.0.189
Receiving 140 files, 19673034809 bytes total. Already received
1 files, 5996604 bytes total
bds/adcounter_per_day 5956840/42259846 bytes(14%) received
from idx:0/192.168.0.189
bds/user_event 39764/39764 bytes(100%) received from
idx:0/192.168.0.189
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed   Dropped
Large messages  n/a 2  3 0
Small messages  n/a 0   68632465 0
Gossip messages n/a 0 217661 0



it is in that state for approx 15 hours now

does it make sense waiting for the streaming to finish or do i have to
restart the node, discard data and restart the rebuild?




Re: cassandra node stops streaming data during nodetool rebuild

2017-04-07 Thread Jacob Shadix
What version are you running? Do you see any errors in the system.log
(SocketTimeout, for instance)?

And what values do you have for the following in cassandra.yaml:
- - stream_throughput_outbound_megabits_per_sec
- - compaction_throughput_mb_per_sec
- - streaming_socket_timeout_in_ms

-- Jacob Shadix

On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta 
wrote:

> hi,
>
> we are trying to setup a new datacenter and are initalizing the data
> with nodetool rebuild.
>
> after some hours it seems that the node stopped streaming (at least
> there is no more streaming traffic on the network interface).
>
> nodetool netstats shows that the streaming is still in progress
>
> Mode: NORMAL
> Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
> Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
> /192.168.0.26
> Receiving 257 files, 145444246572 bytes total. Already received
> 1 files, 1744027 bytes total
> bds/adcounter_total 76456/47310255 bytes(0%) received from
> idx:0/192.168.0.26
> bds/upselling_event 1667571/1667571 bytes(100%) received
> from idx:0/192.168.0.26
> /192.168.0.188
> /192.168.0.27
> Receiving 169 files, 79355302464 bytes total. Already received
> 1 files, 81585975 bytes total
> bds/ad_event_history 81585975/81585975 bytes(100%) received
> from idx:0/192.168.0.27
> /192.168.0.189
> Receiving 140 files, 19673034809 bytes total. Already received
> 1 files, 5996604 bytes total
> bds/adcounter_per_day 5956840/42259846 bytes(14%) received
> from idx:0/192.168.0.189
> bds/user_event 39764/39764 bytes(100%) received from
> idx:0/192.168.0.189
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed   Dropped
> Large messages  n/a 2  3 0
> Small messages  n/a 0   68632465 0
> Gossip messages n/a 0 217661 0
>
>
>
> it is in that state for approx 15 hours now
>
> does it make sense waiting for the streaming to finish or do i have to
> restart the node, discard data and restart the rebuild?
>