The time the first streaming failure occurs varies from a few hours to 1+
day.

We also experience slowness problems with the destination node on Amazon.
Rebuild is slow. That may also contribute to the problem.

Unfortunately we only kept the logs of the source node and there is no
other error prior to the streaming failure.

Only compaction, flushing and writing memtable info messages.

We are running the rebuild once more using destination node's external IP.
If it fails again I will post the errors here.

On Thu, May 26, 2016 at 5:20 PM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> How long does it take after you trigger the rebuild process before it
> fails?
>
> Was there any error before [STREAM-IN-/192.168.1.141] on the destination
> node or [STREAM-OUT-/172.31.22.104] on the source node? Those are showing
> consequences of the root error. In particular what were the last messages
> on [STREAM-OUT-/192.168.1.141] and [STREAM-IN-/172.31.22.104] ?
>
> > Streaming does not seem to be resumed again from this node. Shall I just
> kill again the entire rebuild process?
>
> Yes, resumable rebuild will be supported on CASSANDRA-10810.
>
> 2016-05-26 8:20 GMT-03:00 George Sigletos <sigle...@textkernel.nl>:
>
>> I tried again with setting streaming_socket_timeout_in_ms to 1 day on all
>> nodes and after having upgraded to 2.1.14.
>>
>> My tcp_keep_alive_time is set to 2 hours and tcp_keepalive_probes to 9.
>> That should be ok I would believe.
>>
>> I get streaming error again, shortly after starting the rebuild process.
>> This is from the destination node:
>>
>> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
>> StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>> Streaming error occurred
>> java.lang.RuntimeException: Outgoing stream handler has been closed
>>         at
>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>
>> And this is from the source node:
>>
>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
>> StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>> Streaming error occurred
>> java.io.IOException: Broken pipe
>>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>> ~[na:1.7.0_79]
>>         at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
>> ~[na:1.7.0_79]
>>         at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
>> ~[na:1.7.0_79]
>>         at
>> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358)
>> [apache-cassandra-2.1.14.jar:2.1.14]
>>         at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330)
>> [apache-cassandra-2.1.14.jar:2.1.14]
>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>> INFO  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,111
>> StreamResultFuture.java:180 - [Stream
>> #74c57bc0-231a-11e6-a698-1b05ac77baf9] Session with /172.31.22.104 is
>> complete
>> WARN  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,114
>> StreamResultFuture.java:207 - [Stream
>> #74c57bc0-231a-11e6-a698-1b05ac77baf9] Stream failed
>>
>> > Streaming does not seem to be resumed again from this node. Shall I
>> just kill again the entire rebuild process?
>>
>
>> On Thu, May 26, 2016 at 12:17 AM, Paulo Motta <pauloricard...@gmail.com>
>> wrote:
>>
>>> If increasing or disabling streaming_socket_timeout_in_ms on the source
>>> node does not fix it, you may want to have a look on your tcp keep alive
>>> settings on the source and destination nodes as intermediate
>>> routers/firewalls may be killing the connections due to inactivity. See
>>> this for more information:
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
>>>
>>> This will ultimately fixed by CASSANDRA-11841 by adding keep-alive to
>>> the streaming protocol.
>>>
>>> 2016-05-25 18:09 GMT-03:00 George Sigletos <sigle...@textkernel.nl>:
>>>
>>>> Thanks a lot for your help. I will try that tomorrow. The first time
>>>> that I tried to rebuild, streaming_socket_timeout_in_ms was 0 and still
>>>> failed. Below is the directly previous error on the source node:
>>>>
>>>> ERROR [STREAM-IN-/172.31.22.104] 2016-05-24 22:32:20,437
>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>> Streaming error occurred
>>>> java.io.IOException: Connection timed out
>>>>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>>> ~[na:1.7.0_79]
>>>>         at sun.nio.ch.SocketDispatcher.read(Unknown Source)
>>>> ~[na:1.7.0_79]
>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>>>> ~[na:1.7.0_79]
>>>>         at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
>>>>         at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>>>> ~[na:1.7.0_79]
>>>>         at
>>>> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>         at
>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250)
>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>
>>>> On Wed, May 25, 2016 at 10:28 PM, Paulo Motta <pauloricard...@gmail.com
>>>> > wrote:
>>>>
>>>>> > Workaround is to set to a larger streaming_socket_timeout_in_ms **on
>>>>> the source node**., the new default will be 86400000ms (1 day).
>>>>>
>>>>> 2016-05-25 17:23 GMT-03:00 Paulo Motta <pauloricard...@gmail.com>:
>>>>>
>>>>>> Was there any other ERROR preceding this on this node (in particular
>>>>>> the last few lines of [STREAM-IN-/172.31.22.104])? If it's a
>>>>>> SocketTimeoutException, then what is happening is that the default
>>>>>> streaming socket timeout of 1 hour is not sufficient to stream a single
>>>>>> file and the stream session is failed. Workaround is to set to a larger
>>>>>> streaming_socket_timeout_in_ms, the new default will be 86400000ms
>>>>>> (1 day).
>>>>>>
>>>>>> We are addressing this on
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-11839.
>>>>>>
>>>>>> 2016-05-25 16:42 GMT-03:00 George Sigletos <sigle...@textkernel.nl>:
>>>>>>
>>>>>>> Hello again,
>>>>>>>
>>>>>>> Here is the error message from the source
>>>>>>>
>>>>>>> INFO  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,275
>>>>>>> StreamResultFuture.java:180 - [Stream
>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /172.31.22.104
>>>>>>> is complete
>>>>>>> WARN  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,276
>>>>>>> StreamResultFuture.java:207 - [Stream
>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
>>>>>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-25 00:44:57,353
>>>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>> Streaming error occurred
>>>>>>> java.lang.AssertionError: Memory was freed
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.util.Memory.getLong(Memory.java:249)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>>>
>>>>>>> On Wed, May 25, 2016 at 8:49 PM, Paulo Motta <
>>>>>>> pauloricard...@gmail.com> wrote:
>>>>>>>
>>>>>>>> This is the log of the destination/rebuilding node, you need to
>>>>>>>> check what is the error message on the stream source node 
>>>>>>>> (192.168.1.140).
>>>>>>>>
>>>>>>>>
>>>>>>>> 2016-05-25 15:22 GMT-03:00 George Sigletos <sigle...@textkernel.nl>
>>>>>>>> :
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Here is additional stack trace from system.log:
>>>>>>>>>
>>>>>>>>> ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704
>>>>>>>>> StreamSession.java:620 - [Stream 
>>>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>>>> Remote peer 192.168.1.140 failed stream session.
>>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705
>>>>>>>>> StreamSession.java:505 - [Stream 
>>>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>>>> Streaming error occurred
>>>>>>>>> java.io.IOException: Connection timed out
>>>>>>>>>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>>         at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323)
>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>>>>> INFO  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,625
>>>>>>>>> StreamResultFuture.java:180 - [Stream
>>>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /192.168.1.140
>>>>>>>>> is complete
>>>>>>>>> WARN  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,627
>>>>>>>>> StreamResultFuture.java:207 - [Stream
>>>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
>>>>>>>>> ERROR [RMI TCP Connection(24)-127.0.0.1] 2016-05-24 22:44:58,628
>>>>>>>>> StorageService.java:1075 - Error while rebuilding node
>>>>>>>>> org.apache.cassandra.streaming.StreamException: Stream failed
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
>>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>>         at
>>>>>>>>> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>>         at
>>>>>>>>> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>>         at
>>>>>>>>> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>>         at
>>>>>>>>> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:621)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:475)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_79]
>>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:58,629
>>>>>>>>> StreamSession.java:505 - [Stream 
>>>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>>>> Streaming error occurred
>>>>>>>>> java.io.IOException: Broken pipe
>>>>>>>>>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>>         at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at
>>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 25, 2016 at 5:23 PM, Paulo Motta <
>>>>>>>>> pauloricard...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> The stack trace from the rebuild command not show the root cause
>>>>>>>>>> of the rebuild stream error. Can you check the system.log for ERROR 
>>>>>>>>>> logs
>>>>>>>>>> during streaming and paste here?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to