Hi, Version is 2.0.17. Yes, these are VMs in the cloud though I'm fairly certain they are on a LAN rather than WAN. They are both in the same data centre physically. The phi_convict_threshold is set to default. I'd rather find the root cause of the problem than just hiding it by not convicting a node if it isn't responding though. If pings are <2 ms without a single ping missed in several days, I highly doubt that network is the reason for the downtime.
Best regards, Joel 2016-02-23 16:39 GMT+01:00 <sean_r_dur...@homedepot.com>: > You didn’t mention version, but I saw this kind of thing very often in the > 1.1 line. Often this is connected to network flakiness. Are these VMs? In > the cloud? Connected over a WAN? You mention that ping seems fine. Take a > look at the phi_convict_threshold in c assandra.yaml. You may need to > increase it to reduce the UP/DOWN flapping behavior. > > > > > > Sean Durity > > > > *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com] > *Sent:* Tuesday, February 23, 2016 9:41 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Nodes go down periodically > > > > Hi, > > > > Thanks for your reply. > > > > I have debug logging on and see no GC pauses that are that long. GC pauses > are all well below 1s and 99 times out of 100 below 100ms. > > Do I need to enable GC log options to see the pauses? > > I see plenty of these lines: > DEBUG [ScheduledTasks:1] 2016-02-22 10:43:02,891 GCInspector.java (line > 118) GC for ParNew: 24 ms for 1 collections > > as well as a few CMS GC log lines. > > > > Best regards, > > Joel > > > > 2016-02-23 15:14 GMT+01:00 Hannu Kröger <hkro...@gmail.com>: > > Hi, > > > > Those are probably GC pauses. Memory tuning is probably needed. Check the > parameters that you already have customised if they make sense. > > > > http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html > > > > Hannu > > > > > > On 23 Feb 2016, at 16:08, Joel Samuelsson <samuelsson.j...@gmail.com> > wrote: > > > > Our nodes go down periodically, around 1-2 times each day. Downtime is > from <1 second to 30 or so seconds. > > > > INFO [GossipTasks:1] 2016-02-22 10:05:14,896 Gossiper.java (line 992) > InetAddress /109.74.13.67 is now DOWN > > INFO [RequestResponseStage:8844] 2016-02-22 10:05:38,331 Gossiper.java > (line 978) InetAddress /109.74.13.67 is now UP > > > > I find nothing odd in the logs around the same time. I logged a ping with > timestamp and checked during the same time and saw nothing weird (ping is > less than 2ms at all times). > > > > Does anyone have any suggestions as to why this might happen? > > > > Best regards, > Joel > > > > > > ------------------------------ > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >