Probably too aggressive of a default, yes, but the UM is in seconds: # hz.cluster.max.heartbeat.seconds=5
Enable that property and set it to 300. > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Tom > Poage > Sent: Thursday, June 2, 2016 8:56 AM > To: CAS Community <[email protected]> > Subject: Re: [cas-user] Hazelcast heartbeat timeout? > > So it seems the default heartbeat timeout in Hazelcast is 5 minutes, but > the > default heartbeat timeout in CAS is 5 seconds. > > Purposeful (rationale?), or a scaling error? > > Thanks! > Tom. > > > On Jun 2, 2016, at 8:12 AM, Tom Poage <[email protected]> wrote: > > > > Morning, > > > > We started running 4.2.1 w/ Hazelcast (hz.cluster.tcpip.enabled=true) on > Linux VMs (RedHat variant) a couple of weeks ago with three nodes on the > same subnet. Things seemed fine initially, but a couple of days ago > started > getting cluster errors starting with heartbeat timeout, several > (dis)connects, > attempted repartitions, and ending with the cluster frozen. > > > > Has anyone experience this? E.g. > > > >> 2016-06-01 21:01:25,330 WARN > [com.hazelcast.cluster.impl.ClusterHeartbeatManager] - [-------.50]:5701 > [dev] > [3.6] Removing Member [------.55]:5701 because it has not sent any > heartbeats for 5000 ms. Last heartbeat time was Wed Jun 01 21:01:20 PDT > 2016 > >> 2016-06-01 21:01:25,330 INFO [com.hazelcast.cluster.ClusterService] - > >> [---- > --.50]:5701 [dev] [3.6] Old master Address[------.55]:5701 left the > cluster, > assigning new master Member [128.120.39.50]:5701 this > > ... > >> 2016-06-01 21:01:29,167 WARN > [com.hazelcast.partition.InternalPartitionService] - [------.50]:5701 > [dev] [3.6] > This is the master node and received a PartitionRuntimeState from > Address[--- > ---.55]:5701. Ignoring incoming state! > > ... > >> 2016-06-01 21:05:16,046 INFO > [com.hazelcast.cluster.impl.operations.JoinCheckOperation] - > [------.50]:5701 > [dev] [3.6] Ignoring join check from Address[------.55]:5701, because > cluster is > in FROZEN state ... > > > > Interestingly enough, if we shut down one of the nodes (leaving two), > > the > issue does not recur--at least in the time we've been monitoring. > > > > The only recourse seems to be a full cluster restart. > > > > Thanks for any advice! > > > > Tom. > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "CAS Community" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an > email to [email protected]. > > To post to this group, send email to [email protected]. > > Visit this group at https://groups.google.com/a/apereo.org/group/cas- > user/. > > To view this discussion on the web visit > https://groups.google.com/a/apereo.org/d/msgid/cas-user/B8F20E5F-0BC3- > 44AE-B53F-BCFD1B181E3D%40ucdavis.edu. > > For more options, visit https://groups.google.com/a/apereo.org/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "CAS Community" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at > https://groups.google.com/a/apereo.org/group/cas-user/. > To view this discussion on the web visit > https://groups.google.com/a/apereo.org/d/msgid/cas-user/51FE920A-2FEE- > 4C59-A75C-C1053256CACB%40ucdavis.edu. > For more options, visit https://groups.google.com/a/apereo.org/d/optout. -- You received this message because you are subscribed to the Google Groups "CAS Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/a/apereo.org/group/cas-user/. To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/006701d1bceb%24ecab22b0%24c6016810%24%40unicon.net. For more options, visit https://groups.google.com/a/apereo.org/d/optout.
