What was the reason? On Wed, Sep 28, 2016 at 9:58 AM techpyaasa . <techpya...@gmail.com> wrote:
> Very sorry...I got the reason for this issue.. > Please ignore. > > > On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . <techpya...@gmail.com> > wrote: > >> @Paulo >> >> We have done changes as you said >> net.ipv4.tcp_keepalive_time=60 >> net.ipv4.tcp_keepalive_probes=3 >> net.ipv4.tcp_keepalive_intvl=10 >> >> and increased streaming_socket_timeout_in_ms to 48 hours , >> "phi_convict_threshold : 9". >> >> And once again recommissioned new data center (DC3) , ran " nodetool >> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild' >> got exit without any exception. >> >> Please check logs below >> >> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571 >> StorageService.java (line 914) rebuild from dc: IDC* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520 >> StreamResultFuture.java (line 87) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.75* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.132* >> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.75* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.133* >> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.132* >> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.133* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.167* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.78* >> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.167* >> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.78* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.126* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.191* >> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.126* >> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.191* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.168* >> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527 >> StreamResultFuture.java (line 91) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >> /xxx.xxx.198.169* >> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.168* >> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528 >> StreamSession.java (line 214) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >> /xxx.xxx.198.169* >> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776 >> StreamResultFuture.java (line 186) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is >> complete* >> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778 >> StreamResultFuture.java (line 220) [Stream >> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed* >> >> >> As you can see logs above , nodetool rebuild finished w/o data got >> stremed and all streaming sessions completed WITHIN NOT TIME(See time stamp >> in logs). >> >> >> And also "nodetool status" seems to be all fine from this new nodes(from >> which I run 'nodetool rebuild'). >> >> Please let us know what could be the issue here. >> >> Thanks in advance. >> >> On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pauloricard...@gmail.com> >> wrote: >> >>> Yeah this is likely to be caused by idle connections being shut down, so >>> you may need to update your tcp_keepalive* and/or network/firewall settings. >>> >>> >>> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com>: >>> >>>> Hi paul, >>>> >>>> Thanks for the reply... >>>> >>>> I'm getting following streaming exceptions during nodetool rebuild in >>>> c*-2.0.17 >>>> >>>> *04:24:49,759 StreamSession.java (line 461) [Stream >>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>>> *java.io.IOException: Connection timed out* >>>> * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* >>>> * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* >>>> * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* >>>> * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* >>>> * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* >>>> * at >>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* >>>> * at >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* >>>> * at >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)* >>>> * at java.lang.Thread.run(Thread.java:745)* >>>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>>> ConnectionHandler.java (line 104) [Stream >>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on >>>> /xxx.xxx.98.168* >>>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>>> StreamResultFuture.java (line 186) [Stream >>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is >>>> complete* >>>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>>> StreamSession.java (line 461) [Stream >>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>>> *java.io.IOException: Broken pipe* >>>> * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* >>>> * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* >>>> * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* >>>> * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* >>>> * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* >>>> * at >>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* >>>> * at >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* >>>> * at >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)* >>>> * at java.lang.Thread.run(Thread.java:745)* >>>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 >>>> ConnectionHandler.java (line 244) [Stream >>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId: >>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys: >>>> 4736, transfer size: 2306880, compressed?: true), file: >>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)* >>>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 >>>> StreamSession.java (line 461) [Stream >>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>>> *java.lang.RuntimeException: Outgoing stream handler has been closed* >>>> * at >>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)* >>>> * at >>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)* >>>> * at >>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)* >>>> * at >>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)* >>>> * at java.lang.Thread.run(Thread.java:745)* >>>> >>>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricard...@gmail.com> >>>> wrote: >>>> >>>>> What type of streaming timeout are you getting? Do you have a stack >>>>> trace? What version are you in? >>>>> >>>>> See more information about tuning tcp_keepalive* here: >>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html >>>>> >>>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com> >>>>> : >>>>> >>>>>> @Paulo Motta >>>>>> >>>>>> Even we are facing Streaming timeout exceptions during 'nodetool >>>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as >>>>>> suggested in datastax blog - >>>>>> https://support.datastax.com/hc/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-streaming-errors-or-failures >>>>>> , but still we are getting streaming exceptions. >>>>>> >>>>>> And what is the suggestible settings/values for kernel tcp_keepalive >>>>>> which would help streaming succeed ? >>>>>> >>>>>> Thank you >>>>>> >>>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta < >>>>>> pauloricard...@gmail.com> wrote: >>>>>> >>>>>>> What version are you in? This seems like a typical case were there >>>>>>> was a problem with streaming (hanging, etc), do you have access to the >>>>>>> logs? Maybe look for streaming errors? Typically streaming errors are >>>>>>> related to timeouts, so you should review your cassandra >>>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings. >>>>>>> >>>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool >>>>>>> bootstrap resume. There were also some streaming hanging problems fixed >>>>>>> recently, so I'd advise you to upgrade to the latest version of your >>>>>>> particular series for a more robust version. >>>>>>> >>>>>>> Is there any reason why you didn't use the replace procedure >>>>>>> (-Dreplace_address) to replace the node with the same tokens? This >>>>>>> would be >>>>>>> a bit faster than remove + bootstrap procedure. >>>>>>> >>>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jer...@mainaud.com>: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> A client of mime have problems when adding a node in the cluster. >>>>>>>> After 4 days, the node is still in joining mode, it doesn't have >>>>>>>> the same level of load than the other and there seems to be no >>>>>>>> streaming >>>>>>>> from and to the new node. >>>>>>>> >>>>>>>> This node has a history. >>>>>>>> >>>>>>>> 1. At the begin, it was in a seed in the cluster. >>>>>>>> 2. Ops detected that client had problems with it. >>>>>>>> 3. They tried to reset it but failed. In their process they >>>>>>>> launched several repair and rebuild process on the node. >>>>>>>> 4. Then they asked me to help them. >>>>>>>> 5. We stopped the node, >>>>>>>> 6. removed it from the list of seeds (more precisely it was >>>>>>>> replaced by another node), >>>>>>>> 7. removed it from the cluster (I choose not to use >>>>>>>> decommission since node data was compromised) >>>>>>>> 8. deleted all files from data, commitlog and savedcache >>>>>>>> directories. >>>>>>>> 9. after the leaving process ended, it was started as a fresh >>>>>>>> new node and began autobootstrap. >>>>>>>> >>>>>>>> >>>>>>>> As I don’t have direct access to the cluster I don't have a lot of >>>>>>>> information, but I will have tomorrow (logs and results of some >>>>>>>> commands). >>>>>>>> And I can ask for people any required information. >>>>>>>> >>>>>>>> Does someone have any idea of what could have happened and what I >>>>>>>> should investigate first ? >>>>>>>> What would you do to unlock the situation ? >>>>>>>> >>>>>>>> Context: The cluster consists of two DC, each with 15 nodes. >>>>>>>> Average load is around 3 TB per node. The joining node froze a little >>>>>>>> after >>>>>>>> 2 TB. >>>>>>>> >>>>>>>> Thank you for your help. >>>>>>>> Cheers, >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jérôme Mainaud >>>>>>>> jer...@mainaud.com >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Laxmikanth >>>>>> 99621 38051 >>>>>> >>>>>> >>>>> >>> >> >