Good Morning Cluster Experts,

I have a 3-node cluster with Virtual Machine services.   During the full-OS 
backup timeframe (heavy I/O activity), one of the VMs is receiving a shutdown 
request.   It has happened 3 times in 8 weeks, to 3 different VMs.   I assume 
the cluster is sending this shutdown message.   The VM restarts immediately 
afterwards, likely as a result of cluster monitoring.

I checked the messages log.  It appears that we are not using a heartbeat, 
since I did not add any <totem/> to cluster.conf.   This version of the cluster 
does not use the openais.conf file, but rather cman is started as a service of 
aisexec (cman 2.0).

Does anyone have suggestions about what to do?

Who is sending the shutdown request; is it groupd?

I have two NICs configured on the nodes.   Is one or both IP subnets used in 
the multicast?  Which one?

Thanks,

Paul Dyer

P.S.
here is the messages log from a node startup showing the openais/totem portion:
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] AIS Executive Service 
RELEASE 'subrev 1887 version 0.80.6' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Copyright (C) 2002-2006 
MontaVista Software, Inc and contributors. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Copyright (C) 2006 Red Hat, 
Inc. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] AIS Executive Service: 
started and ready to provide service. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Using default multicast 
address of 239.192.48.228 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Token Timeout (10000 ms) 
retransmit timeout (495 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] token hold (386 ms) 
retransmits before loss (20 retrans) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] join (60 ms) send_join (0 
ms) consensus (4800 ms) merge (200 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] downcheck (1000 ms) fail to 
recv const (50 msgs) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] seqno unchanged const (30 
rotations) Maximum network MTU 1500 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] window size per rotation 
(50 messages) maximum messages per rotation (17 messages) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] send threads (0 threads) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP token expired timeout 
(495 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP token problem counter 
(2000 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP threshold (10 problem 
count) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] RRP mode set to none. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] heartbeat_failures_allowed 
(0) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] max_network_delay (50 ms) 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] HeartBeat is Disabled. To 
enable set heartbeat_failures_allowed > 0 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Receive multicast socket 
recv buffer size (262142 bytes). 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Transmit multicast socket 
send buffer size (262142 bytes). 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] The network interface 
[198.62.216.73] is now up. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Created or loaded sequence 
id 660.198.62.216.73 for this ring. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] entering GATHER state from 
15. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [CMAN ] CMAN 2.0.115 (built Nov 19 
2009 10:37:31) started 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [MAIN ] Service initialized 
'openais CMAN membership service 2.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais extended virtual synchrony service' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais cluster membership service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais availability management framework B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais checkpoint service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais event service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais distributed locking service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais message service B.01.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais configuration service' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais cluster closed process group service v1.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SERV ] Service initialized 
'openais cluster config database access v1.01' 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [SYNC ] Not using a virtual 
synchrony filter. 
Mar 15 16:25:16 lxprodas1xen openais[6250]: [TOTEM] Creating commit token 
because I am the rep. 

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to