Looking for a more experienced person who can explain this issue we had last night.
Our backups kicked in during the night at 1AM. At 1:01AM, our mysql cluster had issues. Specifically I can see in crm_mon where the cluster has it as failed due to an "unknown exec error". Looking at the performance of the node, I can see where wait on I/O went through the roof at 1AM when the tsm backups kicked in. I can see where this caused heartbeat issues because mysql was late checking its instances - it generally takes a few seconds but in this case it took 3 minutes. Of course this is all due to the extremely high wait on I/O but I am curious - why didn't the cluster fail over? Why put MySQL in an unmanaged state and simply say there was an "unknown exec error?". Thanks for any comments _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems