On 06/10/2011 09:07 AM, Dejan Muhamedagic wrote: >> mhm ... one problem is that i cannot distinguish between an inital >> probe and a probe from "crm resource reprobe". >> >> when i do this, my current postfix ocf ra reports "not running", > > Even though it is started? Well, that sounds like a problem. But > I don't really understand. You mentioned at the beginning of the > thread that it is this error: > >> ERROR: Postfix configuration directory '/data/mail/conf' does not exist. 3 > > If the resource runs then the directory must be present, right?
i'll try to start all over ;) i have two nodes a and b in a failover configuration. the configuration resides on a shared storage which is only mounted on the active node: > primitive m-mail-fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/mail" directory="/data/mail/" > fstype="ext4" options="nosuid,nodev,noatime,nodiratime" \ > op stop interval="0" timeout="60" \ > op start interval="0" timeout="60" > primitive m-mail-postfix ocf:ipax:postfix \ > op monitor interval="30" timeout="30" \ > params config_dir="/data/mail/conf/" \ > meta target-role="Started" > group group-mail-base m-mail-fs m-mail-postfix at [1] you find the most current revision of my resource. looking at VirtualDomain i got an idea bout handling the initial probe using ocf_is_probe to determine if this is a probe or not: if it is a probe, the checks do not generate an error (line 217ff), and some checks aren't even run (e.g. postfix check, line 286ff). so, in case of a probe, postfix_validate_all() will return OCF_SUCCESS. (btw. before my changes, postfix_validate_all would return some OCF_ERR_xxx instead) then, i formally used to run the following check: > LSB_STATUS_STOPPED=3 > if [ $ret -ne $OCF_SUCCESS ]; then > case $1 in > stop) exit $OCF_SUCCESS ;; > monitor) exit $OCF_NOT_RUNNING;; > status) exit $LSB_STATUS_STOPPED;; > *) exit $ret;; > esac > fi means: if monitor/status was issued and we did not have a return of OCF_SUCCESS, we return OCF_NOT_RUNNING (afairc, this was actually handling the probing situation before ocf_is_probe was available). because of my changes to postfix_validate_all() introducing ocf_is_probe and returning OCF_SUCCESS, i do not enter this case. this caused errors for the initial probe, so i did the following change: > LSB_STATUS_STOPPED=3 > if [ $ret -ne $OCF_SUCCESS ] || ocf_is_probe; then (see the new ocf_is_probe?) > case $1 in > stop) exit $OCF_SUCCESS ;; > monitor) exit $OCF_NOT_RUNNING;; > status) exit $LSB_STATUS_STOPPED;; > *) exit $ret;; > esac > fi so we always enter this case in the event of a probe. this correctly handles the initial probe and returns OCF_NOT_RUNNING so that pacemaker can continue. *but* the command "crm resource reprobe" is also considered a ocf_is_probe. thus, this block will return a OCF_NOT_RUNNING on *every* node. the standby node *not* running postfix (which is ok) but also on the node which actually *is* running postfix. (and it would also return OCF_NOT_RUNNING if postfix was started at system bootup...) this lets the cluster believe the resource is not running and - because of my configuration - the resource will be (re)started on the last known location/node (which in fact is still running postfix). i hope i managed to explain it properly. :) one possibility to tackle this would be to have a possibility to distinguish the initial probe from a "manual" probe. i could also revert my probing settings and live with an error of e.g. > ocf_log err "Postfix configuration directory '$config_dir' does not exist or > is not readable." > return $OCF_ERR_INSTALLED instead of > ocf_log info "Postfix configuration directory '$config_dir' not readable > during probe." but this isn't quite what i want... another possibility would be to rewrite/drop this probe. but i don't quite know how to do that properly. suggestions are welcome! cheers, raoul [1] https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix -- ____________________________________________________________________ DI (FH) Raoul Bhatia M.Sc. email. [email protected] Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email. [email protected] 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax. +43 1 3670030 15 ____________________________________________________________________ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
