@Paulo

We have done changes as you said
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10

and increased streaming_socket_timeout_in_ms to 48 hours ,
"phi_convict_threshold : 9".

And once again recommissioned new data center (DC3)  , ran " nodetool
rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
got exit without any exception.

Please check logs below

*INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
StorageService.java (line 914) rebuild from dc: IDC*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
StreamResultFuture.java (line 87) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.75*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.132*
* INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.75*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.133*
* INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.132*
* INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.133*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.167*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.78*
* INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.167*
* INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.78*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.126*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.191*
* INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.126*
* INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.191*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.168*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.169*
* INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.168*
* INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.169*
* INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778
StreamResultFuture.java (line 220) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed*


As you can see logs above , nodetool rebuild finished w/o data got stremed
and all streaming sessions completed WITHIN NOT TIME(See time stamp in
logs).


And also "nodetool status" seems to be all fine from this new nodes(from
which I run 'nodetool rebuild').

Please let us know what could be the issue here.

Thanks in advance.

On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pauloricard...@gmail.com>
wrote:

> Yeah this is likely to be caused by idle connections being shut down, so
> you may need to update your tcp_keepalive* and/or network/firewall settings.
>
>
> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>:
>
>> Hi paul,
>>
>> Thanks for the reply...
>>
>> I'm getting following streaming exceptions during nodetool rebuild in
>> c*-2.0.17
>>
>> *04:24:49,759 StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.io.IOException: Connection timed out*
>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>> *    at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> ConnectionHandler.java (line 104) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>> /xxx.xxx.98.168*
>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> StreamResultFuture.java (line 186) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>> complete*
>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.io.IOException: Broken pipe*
>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>> *    at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>> ConnectionHandler.java (line 244) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>> 4736, transfer size: 2306880, compressed?: true), file:
>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>> StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>> *    at
>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>> *    at
>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>>
>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricard...@gmail.com> wrote:
>>
>>> What type of streaming timeout are you getting? Do you have a stack
>>> trace? What version are you in?
>>>
>>> See more information about tuning tcp_keepalive* here:
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>>> shooting/trblshootIdleFirewall.html
>>>
>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>:
>>>
>>>> @Paulo Motta
>>>>
>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>>> suggested in datastax blog  - https://support.datastax.com/h
>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>>> treaming-errors-or-failures  , but still we are getting streaming
>>>> exceptions.
>>>>
>>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>>> which would help streaming succeed ?
>>>>
>>>> Thank you
>>>>
>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricard...@gmail.com
>>>> > wrote:
>>>>
>>>>> What version are you in? This seems like a typical case were there was
>>>>> a problem with streaming (hanging, etc), do you have access to the logs?
>>>>> Maybe look for streaming errors? Typically streaming errors are related to
>>>>> timeouts, so you should review your cassandra
>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>>
>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>>> particular series for a more robust version.
>>>>>
>>>>> Is there any reason why you didn't use the replace procedure
>>>>> (-Dreplace_address) to replace the node with the same tokens? This would 
>>>>> be
>>>>> a bit faster than remove + bootstrap procedure.
>>>>>
>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jer...@mainaud.com>:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> A client of mime have problems when adding a node in the cluster.
>>>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>>>> same level of load than the other and there seems to be no streaming from
>>>>>> and to the new node.
>>>>>>
>>>>>> This node has a history.
>>>>>>
>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>    2. Ops detected that client had problems with it.
>>>>>>    3. They tried to reset it but failed. In their process they
>>>>>>    launched several repair and rebuild process on the node.
>>>>>>    4. Then they asked me to help them.
>>>>>>    5. We stopped the node,
>>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>>    replaced by another node),
>>>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>>>    since node data was compromised)
>>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>>    directories.
>>>>>>    9. after the leaving process ended, it was started as a fresh new
>>>>>>    node and began autobootstrap.
>>>>>>
>>>>>>
>>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>>> information, but I will have tomorrow (logs and results of some 
>>>>>> commands).
>>>>>> And I can ask for people any required information.
>>>>>>
>>>>>> Does someone have any idea of what could have happened and what I
>>>>>> should investigate first ?
>>>>>> What would you do to unlock the situation ?
>>>>>>
>>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>>>
>>>>>> Thank you for your help.
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jérôme Mainaud
>>>>>> jer...@mainaud.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Laxmikanth
>>>> 99621 38051
>>>>
>>>>
>>>
>

Reply via email to