Hi,
During testing, I noticed that a time step caused by ntpd caused the cluster to drop into GATHER state: Jun 16 12:13:16 cp1edidbm001 ntpd[30917]: time reset -16.332117 s Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering GATHER state from 12. Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Creating commit token because I am the rep. Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Saving state aru 9e high seq received 9e Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Storing new sequence id for ring 328 Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering COMMIT state. Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering RECOVERY state. ... This is easily repeatable through setting the clock forwards by 20 seconds using /bin/date. This probably causes comms timeouts to expire prematurely, and almost every time causes the cluster to reconfigure - luckily without affecting running services. Stepping the clock backwards also causes a similar disruption, but there is a long lag between changing the time and the cluster reconfiguring: perhaps this extends a timeout or sleep on the affected node, causing genuine timeouts on the other nodes. All I am looking for is some reassurance that clock changes are not going to crash the cluster. Is anyone able to confirm this please ? regards, Martin
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
