So it seems the default heartbeat timeout in Hazelcast is 5 minutes, but the 
default heartbeat timeout in CAS is 5 seconds.

Purposeful (rationale?), or a scaling error?

Thanks!
Tom.

> On Jun 2, 2016, at 8:12 AM, Tom Poage <[email protected]> wrote:
> 
> Morning,
> 
> We started running 4.2.1 w/ Hazelcast (hz.cluster.tcpip.enabled=true) on 
> Linux VMs (RedHat variant) a couple of weeks ago with three nodes on the same 
> subnet. Things seemed fine initially, but a couple of days ago started 
> getting cluster errors starting with heartbeat timeout, several 
> (dis)connects, attempted repartitions, and ending with the cluster frozen.
> 
> Has anyone experience this? E.g.
> 
>> 2016-06-01 21:01:25,330 WARN 
>> [com.hazelcast.cluster.impl.ClusterHeartbeatManager] - [-------.50]:5701 
>> [dev] [3.6] Removing Member [------.55]:5701 because it has not sent any 
>> heartbeats for 5000 ms. Last heartbeat time was Wed Jun 01 21:01:20 PDT 2016
>> 2016-06-01 21:01:25,330 INFO [com.hazelcast.cluster.ClusterService] - 
>> [------.50]:5701 [dev] [3.6] Old master Address[------.55]:5701 left the 
>> cluster, assigning new master Member [128.120.39.50]:5701 this
> ...
>> 2016-06-01 21:01:29,167 WARN 
>> [com.hazelcast.partition.InternalPartitionService] - [------.50]:5701 [dev] 
>> [3.6] This is the master node and received a PartitionRuntimeState from 
>> Address[------.55]:5701. Ignoring incoming state! 
> ...
>> 2016-06-01 21:05:16,046 INFO 
>> [com.hazelcast.cluster.impl.operations.JoinCheckOperation] - 
>> [------.50]:5701 [dev] [3.6] Ignoring join check from 
>> Address[------.55]:5701, because cluster is in FROZEN state ...
> 
> Interestingly enough, if we shut down one of the nodes (leaving two), the 
> issue does not recur--at least in the time we've been monitoring.
> 
> The only recourse seems to be a full cluster restart.
> 
> Thanks for any advice!
> 
> Tom.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "CAS Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/a/apereo.org/group/cas-user/.
> To view this discussion on the web visit 
> https://groups.google.com/a/apereo.org/d/msgid/cas-user/B8F20E5F-0BC3-44AE-B53F-BCFD1B181E3D%40ucdavis.edu.
> For more options, visit https://groups.google.com/a/apereo.org/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "CAS 
Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/a/apereo.org/group/cas-user/.
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-user/51FE920A-2FEE-4C59-A75C-C1053256CACB%40ucdavis.edu.
For more options, visit https://groups.google.com/a/apereo.org/d/optout.

Reply via email to