On Tue, Jun 14, 2011 at 05:45:21PM +0200, Raoul Bhatia [IPAX] wrote:
> this caused errors for the initial probe, so i did the following change:
>
> > LSB_STATUS_STOPPED=3
> > if [ $ret -ne $OCF_SUCCESS ] || ocf_is_probe; then
> (see the new ocf_is_probe?)
> > case $1 in
> > stop) exit $OCF_SUCCESS ;;
> > monitor) exit $OCF_NOT_RUNNING;;
> > status) exit $LSB_STATUS_STOPPED;;
> > *) exit $ret;;
> > esac
> > fi
>
> so we always enter this case in the event of a probe. this correctly
> handles the initial probe and returns OCF_NOT_RUNNING so that pacemaker
> can continue.
>
>
> *but* the command "crm resource reprobe" is also considered a
> ocf_is_probe. thus, this block will return a OCF_NOT_RUNNING on *every*
> node. the standby node *not* running postfix (which is ok) but also
> on the node which actually *is* running postfix. (and it would also
> return OCF_NOT_RUNNING if postfix was started at system bootup...)
>
> this lets the cluster believe the resource is not running and - because
> of my configuration - the resource will be (re)started on the last
> known location/node (which in fact is still running postfix).
>
> i hope i managed to explain it properly. :)
Yep.
That code is clearly broken.
A probe (regardless of "initial", "manual" or for whatever reason) has
to correctly report the current status. Your probe always returns "not
running".
Fix that ;-)
Dejan:
can we have an additional branch in ocf-tester,
that checks something like
"probe if stopped" returns $OCF_NOT_RUNNING,
"probe if started" returns $OCF_SUCCESS
Lars
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/