Looking for a more experienced person who can explain this issue we had 
last night.

Our backups kicked in during the night at 1AM. At 1:01AM, our mysql 
cluster had issues. Specifically I can see in crm_mon where the cluster 
has it as failed due to an "unknown exec error". Looking at the 
performance of the node, I can see where wait on I/O went through the 
roof at 1AM when the tsm backups kicked in. I can see where this caused 
heartbeat issues because mysql was late checking its instances - it 
generally takes a few seconds but in this case it took 3 minutes. Of 
course this is all due to the extremely high wait on I/O but I am 
curious - why didn't the cluster fail over? Why put MySQL in an 
unmanaged state and simply say there was an "unknown exec error?".

Thanks for any comments
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to