Lars Marowsky-Bree <l...@suse.com> writes:

> "Poisoned resources" indeed should just fail to start and that should be
> that. What instead can happen is that the resource agent notices it
> can't start, reports back to the cluster, and the cluster manager goes
> "Oh no, I couldn't start the resource successfully! It's now possibly in
> a weird state and I better stop it!"
>
> ... And because of the misconfiguration, the *stop* also fails, and
> you're hit with the full power of node-level recovery.
>
> I think this is an issue with some resource agents (if the parameters
> are so bad that the resource couldn't possibly have started, why fail
> the stop?) and possibly also something where one could contemplate a
> better on-fail="" default for "stop in response to first-start failure".

Check out http://www.linux-ha.org/doc/dev-guides/_execution_block.html,
especially the comment "anything other than meta-data and usage must
pass validation".  So if the start action fails with some validation
error, the stop action will as well.  Is this good practice after all?
Or is OCF_ERR_GENERIC treated differently from the other errors in this
regard and thus the validate action should never return OCF_ERR_GENERIC?
-- 
Thanks,
Feri.
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to