On 10-11-04 12:38 PM, Dejan Muhamedagic wrote:
> Hi,
>
> On Thu, Nov 04, 2010 at 11:06:48AM -0300, mike wrote:
>    
>> Looking for a more experienced person who can explain this issue we had
>> last night.
>>
>> Our backups kicked in during the night at 1AM. At 1:01AM, our mysql
>> cluster had issues. Specifically I can see in crm_mon where the cluster
>> has it as failed due to an "unknown exec error". Looking at the
>> performance of the node, I can see where wait on I/O went through the
>> roof at 1AM when the tsm backups kicked in. I can see where this caused
>> heartbeat issues because mysql was late checking its instances - it
>> generally takes a few seconds but in this case it took 3 minutes. Of
>> course this is all due to the extremely high wait on I/O but I am
>> curious - why didn't the cluster fail over? Why put MySQL in an
>> unmanaged state and simply say there was an "unknown exec error?".
>>      
> Can't say without looking at the logs and the PE files. One
> possible explanation is that a resource was for whatever reason
> not allowed to run on the other node: a failure in the past
> which didn't expire or a negative location constraint. Or the
> fail count reached migration threshold (if defined).
>
> Thanks,
>
> Dejan
>
>
>    
>> Thanks for any comments
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>      
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>    
Thanks for the reply Dejan. I have the failcount threshold set to 3 on 
both nodes and if I understand it correctly, after a 3rd failure it 
should fail over to then backup node. Correct? What do you mean by a 
negative location constraint?

Mike
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to