Re: Error while rebuilding a node: Stream failed

2016-06-02 Thread George Sigletos
I gave up completely with rebuild. Now I am running `nodetool repair` and in case of network issues I retry for the token ranges that failed using the -st and -et options of `nodetool repair`. That would be good enough for now, till we fix our network problems. On Sat, May 28, 2016 at 7:05 PM,

Re: Error while rebuilding a node: Stream failed

2016-05-28 Thread George Sigletos
No luck unfortunately. It seems that the connection to the destination node was lost. However there was progress compared to the previous times. A lot more data was streamed. (From source node) INFO [GossipTasks:1] 2016-05-28 17:53:57,155 Gossiper.java:1008 - InetAddress /54.172.235.227 is now

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread George Sigletos
I am trying once more using more aggressive tcp settings, as recommended here sudo sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10 (added to

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread Paulo Motta
I'm afraid raising streaming_socket_timeout_in_ms won't help much in this case because the incoming connection on the source node is timing out on the network layer, and streaming_socket_timeout_in_ms controls the socket timeout in the app layer and throws SocketTimeoutException (not

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread Sebastian Estevez
Check ifconfig for dripped tpc messages. Let's rule out your network. all the best, Sebastián On May 27, 2016 10:45 AM, "George Sigletos" wrote: > Hello, > > No there is no version mix. The first stack traces were indeed from > 2.1.13. Then I upgraded all nodes to

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread George Sigletos
Hello, No there is no version mix. The first stack traces were indeed from 2.1.13. Then I upgraded all nodes to 2.1.14. Still getting the same errors On Fri, May 27, 2016 at 4:39 PM, Eric Evans wrote: > From the various stacktraces in this thread, it's obvious you

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread Eric Evans
>From the various stacktraces in this thread, it's obvious you are mixing versions 2.1.13 and 2.1.14. Topology changes like this aren't supported with mixed Cassandra versions. Sometimes it will work, sometimes it won't (and it will definitely not work in this instance). You should either

Re: Error while rebuilding a node: Stream failed

2016-05-27 Thread George Sigletos
Still failing. Should I maybe set a higher value for streaming_socket_timeout_in_ms? Maybe 2-3 days? Source: node ERROR [STREAM-OUT-/54.172.235.227] 2016-05-27 14:30:34,401 StreamSession.java:505 - [Stream #45017970-234c-11e6-9452-1b05ac77baf9] Streaming error occurred java.lang.AssertionError:

Re: Error while rebuilding a node: Stream failed

2016-05-26 Thread George Sigletos
The time the first streaming failure occurs varies from a few hours to 1+ day. We also experience slowness problems with the destination node on Amazon. Rebuild is slow. That may also contribute to the problem. Unfortunately we only kept the logs of the source node and there is no other error

Re: Error while rebuilding a node: Stream failed

2016-05-26 Thread Paulo Motta
How long does it take after you trigger the rebuild process before it fails? Was there any error before [STREAM-IN-/192.168.1.141] on the destination node or [STREAM-OUT-/172.31.22.104] on the source node? Those are showing consequences of the root error. In particular what were the last messages

Re: Error while rebuilding a node: Stream failed

2016-05-26 Thread George Sigletos
I tried again with setting streaming_socket_timeout_in_ms to 1 day on all nodes and after having upgraded to 2.1.14. My tcp_keep_alive_time is set to 2 hours and tcp_keepalive_probes to 9. That should be ok I would believe. I get streaming error again, shortly after starting the rebuild process.

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread Paulo Motta
If increasing or disabling streaming_socket_timeout_in_ms on the source node does not fix it, you may want to have a look on your tcp keep alive settings on the source and destination nodes as intermediate routers/firewalls may be killing the connections due to inactivity. See this for more

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
Thanks a lot for your help. I will try that tomorrow. The first time that I tried to rebuild, streaming_socket_timeout_in_ms was 0 and still failed. Below is the directly previous error on the source node: ERROR [STREAM-IN-/172.31.22.104] 2016-05-24 22:32:20,437 StreamSession.java:505 - [Stream

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread Paulo Motta
> Workaround is to set to a larger streaming_socket_timeout_in_ms **on the source node**., the new default will be 8640ms (1 day). 2016-05-25 17:23 GMT-03:00 Paulo Motta : > Was there any other ERROR preceding this on this node (in particular the > last few lines of

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread Paulo Motta
Was there any other ERROR preceding this on this node (in particular the last few lines of [STREAM-IN-/172.31.22.104])? If it's a SocketTimeoutException, then what is happening is that the default streaming socket timeout of 1 hour is not sufficient to stream a single file and the stream session

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
Hello again, Here is the error message from the source INFO [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,275 StreamResultFuture.java:180 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /172.31.22.104 is complete WARN [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,276

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread Paulo Motta
This is the log of the destination/rebuilding node, you need to check what is the error message on the stream source node (192.168.1.140). 2016-05-25 15:22 GMT-03:00 George Sigletos : > Hello, > > Here is additional stack trace from system.log: > > ERROR

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
Hello, Here is additional stack trace from system.log: ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704 StreamSession.java:620 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9] Remote peer 192.168.1.140 failed stream session. ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread Paulo Motta
The stack trace from the rebuild command not show the root cause of the rebuild stream error. Can you check the system.log for ERROR logs during streaming and paste here?

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread George Sigletos
Hi Mike, Yes I am using NetworkTopologyStrategy. I checked cassandra-rackdc.properties on the new node: dc=DCamazon-1 rack=RACamazon-1 I also checked the jira link you sent me. My network topology seems correct: I have 4 nodes in DC1 and 1 node in DCamazon-1 and I can verify that when running

Re: Error while rebuilding a node: Stream failed

2016-05-25 Thread Mike Yeap
Hi George, are you using NetworkTopologyStrategy as the replication strategy for your keyspace? If yes, can you check the cassandra-rackdc.properties of this new node? https://issues.apache.org/jira/browse/CASSANDRA-8279 Regards, Mike Yeap On Wed, May 25, 2016 at 2:31 PM, George Sigletos