> > Forgot to set replication for new data center :(
I was feeling like it could be it :-). From the other thread: > It should be ran from DC3 servers, after altering keyspace to add > keyspaces to the new datacenter. Is this the way you're doing it? > > - Are all the nodes using the same version ('nodetool version')? > - What does 'nodetool status keyspace_name1' output? > - Are you sure to be using Network Topology Strategy on ' > *keyspace_name1'? *Have you modified this schema to add replications > on DC3 > > My guess is something could be wrong with the configuration. > I was starting to wonder about this one though, so thanks for letting us about it :-). C*heers, ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-09-28 23:54 GMT+02:00 techpyaasa . <techpya...@gmail.com>: > Forgot to set replication for new data center :( > > On Wed, Sep 28, 2016 at 11:33 PM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> What was the reason? >> >> On Wed, Sep 28, 2016 at 9:58 AM techpyaasa . <techpya...@gmail.com> >> wrote: >> >>> Very sorry...I got the reason for this issue.. >>> Please ignore. >>> >>> >>> On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . <techpya...@gmail.com> >>> wrote: >>> >>>> @Paulo >>>> >>>> We have done changes as you said >>>> net.ipv4.tcp_keepalive_time=60 >>>> net.ipv4.tcp_keepalive_probes=3 >>>> net.ipv4.tcp_keepalive_intvl=10 >>>> >>>> and increased streaming_socket_timeout_in_ms to 48 hours , >>>> "phi_convict_threshold : 9". >>>> >>>> And once again recommissioned new data center (DC3) , ran " nodetool >>>> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild' >>>> got exit without any exception. >>>> >>>> Please check logs below >>>> >>>> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571 >>>> StorageService.java (line 914) rebuild from dc: IDC* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520 >>>> StreamResultFuture.java (line 87) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for >>>> Rebuild* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.75* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.132* >>>> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.75* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.133* >>>> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.132* >>>> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.133* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.167* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.78* >>>> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.167* >>>> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.78* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.126* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.191* >>>> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.126* >>>> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.191* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.168* >>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527 >>>> StreamResultFuture.java (line 91) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with >>>> /xxx.xxx.198.169* >>>> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.168* >>>> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528 >>>> StreamSession.java (line 214) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to >>>> /xxx.xxx.198.169* >>>> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776 >>>> StreamResultFuture.java (line 186) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is >>>> complete* >>>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778 >>>> StreamResultFuture.java (line 220) [Stream >>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed* >>>> >>>> >>>> As you can see logs above , nodetool rebuild finished w/o data got >>>> stremed and all streaming sessions completed WITHIN NOT TIME(See time stamp >>>> in logs). >>>> >>>> >>>> And also "nodetool status" seems to be all fine from this new >>>> nodes(from which I run 'nodetool rebuild'). >>>> >>>> Please let us know what could be the issue here. >>>> >>>> Thanks in advance. >>>> >>>> On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pauloricard...@gmail.com> >>>> wrote: >>>> >>>>> Yeah this is likely to be caused by idle connections being shut down, >>>>> so you may need to update your tcp_keepalive* and/or network/firewall >>>>> settings. >>>>> >>>>> >>>>> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <laxmikanth...@gmail.com> >>>>> : >>>>> >>>>>> Hi paul, >>>>>> >>>>>> Thanks for the reply... >>>>>> >>>>>> I'm getting following streaming exceptions during nodetool rebuild in >>>>>> c*-2.0.17 >>>>>> >>>>>> *04:24:49,759 StreamSession.java (line 461) [Stream >>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>>>>> *java.io.IOException: Connection timed out* >>>>>> * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* >>>>>> * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* >>>>>> * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* >>>>>> * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* >>>>>> * at >>>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)* >>>>>> * at java.lang.Thread.run(Thread.java:745)* >>>>>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>>>>> ConnectionHandler.java (line 104) [Stream >>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler >>>>>> on >>>>>> /xxx.xxx.98.168* >>>>>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>>>>> StreamResultFuture.java (line 186) [Stream >>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is >>>>>> complete* >>>>>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >>>>>> StreamSession.java (line 461) [Stream >>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>>>>> *java.io.IOException: Broken pipe* >>>>>> * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* >>>>>> * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* >>>>>> * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* >>>>>> * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* >>>>>> * at >>>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)* >>>>>> * at java.lang.Thread.run(Thread.java:745)* >>>>>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 >>>>>> ConnectionHandler.java (line 244) [Stream >>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId: >>>>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys: >>>>>> 4736, transfer size: 2306880, compressed?: true), file: >>>>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)* >>>>>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 >>>>>> StreamSession.java (line 461) [Stream >>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >>>>>> *java.lang.RuntimeException: Outgoing stream handler has been closed* >>>>>> * at >>>>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)* >>>>>> * at >>>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)* >>>>>> * at java.lang.Thread.run(Thread.java:745)* >>>>>> >>>>>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricard...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> What type of streaming timeout are you getting? Do you have a stack >>>>>>> trace? What version are you in? >>>>>>> >>>>>>> See more information about tuning tcp_keepalive* here: >>>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble >>>>>>> shooting/trblshootIdleFirewall.html >>>>>>> >>>>>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula < >>>>>>> laxmikanth...@gmail.com>: >>>>>>> >>>>>>>> @Paulo Motta >>>>>>>> >>>>>>>> Even we are facing Streaming timeout exceptions during 'nodetool >>>>>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) >>>>>>>> as >>>>>>>> suggested in datastax blog - https://support.datastax.com/h >>>>>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of- >>>>>>>> streaming-errors-or-failures , but still we are getting streaming >>>>>>>> exceptions. >>>>>>>> >>>>>>>> And what is the suggestible settings/values for kernel >>>>>>>> tcp_keepalive which would help streaming succeed ? >>>>>>>> >>>>>>>> Thank you >>>>>>>> >>>>>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta < >>>>>>>> pauloricard...@gmail.com> wrote: >>>>>>>> >>>>>>>>> What version are you in? This seems like a typical case were there >>>>>>>>> was a problem with streaming (hanging, etc), do you have access to the >>>>>>>>> logs? Maybe look for streaming errors? Typically streaming errors are >>>>>>>>> related to timeouts, so you should review your cassandra >>>>>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings. >>>>>>>>> >>>>>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool >>>>>>>>> bootstrap resume. There were also some streaming hanging problems >>>>>>>>> fixed >>>>>>>>> recently, so I'd advise you to upgrade to the latest version of your >>>>>>>>> particular series for a more robust version. >>>>>>>>> >>>>>>>>> Is there any reason why you didn't use the replace procedure >>>>>>>>> (-Dreplace_address) to replace the node with the same tokens? This >>>>>>>>> would be >>>>>>>>> a bit faster than remove + bootstrap procedure. >>>>>>>>> >>>>>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jer...@mainaud.com>: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> A client of mime have problems when adding a node in the cluster. >>>>>>>>>> After 4 days, the node is still in joining mode, it doesn't have >>>>>>>>>> the same level of load than the other and there seems to be no >>>>>>>>>> streaming >>>>>>>>>> from and to the new node. >>>>>>>>>> >>>>>>>>>> This node has a history. >>>>>>>>>> >>>>>>>>>> 1. At the begin, it was in a seed in the cluster. >>>>>>>>>> 2. Ops detected that client had problems with it. >>>>>>>>>> 3. They tried to reset it but failed. In their process they >>>>>>>>>> launched several repair and rebuild process on the node. >>>>>>>>>> 4. Then they asked me to help them. >>>>>>>>>> 5. We stopped the node, >>>>>>>>>> 6. removed it from the list of seeds (more precisely it was >>>>>>>>>> replaced by another node), >>>>>>>>>> 7. removed it from the cluster (I choose not to use >>>>>>>>>> decommission since node data was compromised) >>>>>>>>>> 8. deleted all files from data, commitlog and savedcache >>>>>>>>>> directories. >>>>>>>>>> 9. after the leaving process ended, it was started as a fresh >>>>>>>>>> new node and began autobootstrap. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> As I don’t have direct access to the cluster I don't have a lot >>>>>>>>>> of information, but I will have tomorrow (logs and results of some >>>>>>>>>> commands). And I can ask for people any required information. >>>>>>>>>> >>>>>>>>>> Does someone have any idea of what could have happened and what I >>>>>>>>>> should investigate first ? >>>>>>>>>> What would you do to unlock the situation ? >>>>>>>>>> >>>>>>>>>> Context: The cluster consists of two DC, each with 15 nodes. >>>>>>>>>> Average load is around 3 TB per node. The joining node froze a >>>>>>>>>> little after >>>>>>>>>> 2 TB. >>>>>>>>>> >>>>>>>>>> Thank you for your help. >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jérôme Mainaud >>>>>>>>>> jer...@mainaud.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> Laxmikanth >>>>>>>> 99621 38051 >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >