Re: [Linux-HA] What does "(unmanaged) FAILED" mean in crm_mon ?

Alan Robertson Fri, 13 Apr 2007 10:55:46 -0700

Benjamin Watine wrote:
> I answer to myself :)
> 
> It simply was a timeout on stopping mysqld. I change it from 3s to 5s
> and it run well.
> 
> But I don't understand why HB don't put resource group on other node
> instead of stopping managing it (and leaving it unworking). Imagine
> MySQL is broken and init script take a long time before replying, HB
> _should_ permit other node taking resource group. In all error case, it
> should do this.
> Maybe I don't understand something about this...



A stop failure is very serious.

It means we don't know if the resource is still running or not.  And, it
would be expecting too much for the monitor action to accurately report
whether the resource is really completely stopped or not after we kill
the resource script.

The only way guaranteed to recover from a stop failure is to reboot the
machine.

If it failed in 3 seconds for stopping, I would make the stop timeout 30
seconds, or 50 seconds.  I certainly wouldn't make it 5 seconds.  If the
machine is under heavy load, this could take a long time.

-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[EMAIL PROTECTED]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] What does "(unmanaged) FAILED" mean in crm_mon ?

Reply via email to