Hi Raoul,
On Wed, Jul 15, 2009 at 03:42:16PM +0200, Raoul Bhatia [IPAX] wrote:
> hi dejan,
>
> sorry for the late reply, i've been on vacation and am still catching
> up at work.
>
> thanks for your feedback, please see below.
>
> Dejan Muhamedagic wrote:
> > Hi Raoul,
> >
> > Sorry for the delay, somehow I missed the last two messages.
> >
> > On Tue, Jun 23, 2009 at 02:57:52PM +0200, Raoul Bhatia [IPAX] wrote:
> >> Raoul Bhatia [IPAX] wrote:
> >>> i'm reworking my script right now. commenting inline.
> >> i just finished updating the postfix ocf ra and am summarizing the
> >> changes:
> >>
> >> * isRunning() stays as this is also used in other ras
> >> * i left running() as well (where i check the master.pid file)
> >> but am ready to rewrite it to use "postqueue -p" or "postfix status"
> >> in addition or exclusively - waiting for your feedback
> >
> > In addition to testing for the pidfile, you could also check if
> > there's a process holding the spool directory, sth like:
> >
> > rondo:~ # postconf -h queue_directory
> > /var/spool/postfix
> > rondo:~ # fuser /var/spool/postfix/
> > /var/spool/postfix: 5332c 5365c 8313c
> >
> > Perhaps:
> >
> > rondo:~ # fuser -v /var/spool/postfix/ 2>&1 | grep -w master
> > /var/spool/postfix: root 5332 ..c.. master
>
> i'm now checking more indepth for:
> 1. empty queue_directory
For monitor? Why?
> 2. pidfile
> 3. "postfix status"
> 4. postqueue ... | grep 'Mail system is down'
> 5. fuser -v $queue
>
> is this ok? feel free to remove some checks
It should be enough just to check for the process. First using
pidfile and if that doesn't work then with fuser.
> >> * i removed $() bashism
> >> * removed "pid=$(sed 's/ //g' ${queue}/pid/master.pid)"
> >> * as of now, removed the postfix_monitor check on "stop"
> >> * waiting 5 seconds for postfix shutdown, then escalating to "abort"
> >> * removed exits inside the functions and replaced it with return.
> >>
> >> did i miss something from your feedback?
> >> do you have any further comments?
> >
> > Lars said:
> >
> >>> if postconf -h queue_directory does not work, this is a broken
> >>> installation and should IMO not provide any other "default"
> >>> value.
> >
> > and I'd agree with this. It's really important that resources are
> > properly configured.
>
> i'm catching this now but am not sure if i'm correctly handling this
> case in "isRunning()".
Just check if that returns a valid directory?
> maybe checking this inside validate_all() is
> good enough?
Not sure. Your preference :)
Thanks,
Dejan
> cheers,
> raoul
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc. email. [email protected]
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
> Barawitzkagasse 10/2/2/11 email. [email protected]
> 1190 Wien tel. +43 1 3670030
> FN 277995t HG Wien fax. +43 1 3670030 15
> ____________________________________________________________________
> #!/bin/sh
> #
> # Resource script for Postfix
> #
> # Description: Manages Postfix as an OCF resource in
> # an high-availability setup.
> #
> # Tested with postfix 2.5.5 on Debian 5.0.
> # Based on the mysql-proxy and mysql OCF resource agents.
> #
> # Author: Raoul Bhatia <[email protected]> : Original Author
> # License: GNU General Public License (GPL)
> # Note: if you want to run multiple postfix instances, please see
> #
> http://amd.co.at/adminwiki/Postfix#Adding_a_Second_Postfix_Instance_on_one_Server
> # http://www.postfix.org/postconf.5.html
> #
> #
> # usage: $0 {start|stop|reload|status|monitor|validate-all|meta-data}
> #
> # The "start" arg starts a Postfix instance
> #
> # The "stop" arg stops it.
> #
> #
> # Test via
> # * /usr/sbin/ocf-tester -n post1 /usr/lib/ocf/resource.d/heartbeat/postfix
> # * /usr/sbin/ocf-tester -n post1 -o binary="/usr/sbin/postfix"
> # -o config_dir="" /usr/lib/ocf/resource.d/heartbeat/postfix
> # * /usr/sbin/ocf-tester -n post1 -o binary="/usr/sbin/postfix"
> # -o config_dir="/root/postfix/"
> /usr/lib/ocf/resource.d/heartbeat/postfix
> #
> #
> # OCF parameters:
> # OCF_RESKEY_binary
> # OCF_RESKEY_config_dir
> # OCF_RESKEY_parameters
> #
> ##########################################################################
>
> # Initialization:
>
> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
>
> : ${OCF_RESKEY_binary="/usr/sbin/postfix"}
> : ${OCF_RESKEY_config_dir=""}
> : ${OCF_RESKEY_parameters=""}
> USAGE="Usage: $0 {start|stop|reload|status|monitor|validate-all|meta-data}";
>
> ##########################################################################
>
> usage() {
> echo $USAGE >&2
> }
>
> meta_data() {
> cat <<END
> <?xml version="1.0"?>
> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
> <resource-agent name="postfix">
> <version>0.1</version>
> <longdesc lang="en">
> This script manages Postfix as an OCF resource in a high-availability setup.
> Tested with Postfix 2.5.5 on Debian 5.0.
> </longdesc>
> <shortdesc lang="en">OCF Resource Agent compliant Postfix script.</shortdesc>
>
> <parameters>
>
> <parameter name="binary" unique="0" required="0">
> <longdesc lang="en">
> Full path to the Postfix binary.
> For example, "/usr/sbin/postfix".
> </longdesc>
> <shortdesc lang="en">Full path to Postfix binary</shortdesc>
> <content type="string" default="/usr/sbin/postfix" />
> </parameter>
>
> <parameter name="config_dir" unique="1" required="0">
> <longdesc lang="en">
> Full path to a Postfix configuration directory.
> For example, "/etc/postfix".
> </longdesc>
> <shortdesc lang="en">Full path to configuration directory</shortdesc>
> <content type="string" default="" />
> </parameter>
>
> <parameter name="parameters" unique="0" required="0">
> <longdesc lang="en">
> The Postfix daemon may be called with additional parameters.
> Specify any of them here.
> </longdesc>
> <shortdesc lang="en"></shortdesc>
> <content type="string" default="" />
> </parameter>
>
> </parameters>
>
> <actions>
> <action name="start" timeout="90" />
> <action name="stop" timeout="100" />
> <action name="reload" timeout="100" />
> <action name="monitor" depth="10" timeout="20s" interval="60s"
> start-delay="0" />
> <action name="validate-all" timeout="30s" />
> <action name="meta-data" timeout="5s" />
> </actions>
> </resource-agent>
> END
> }
>
> isRunning()
> {
> kill -0 "$1" 2>/dev/null
> }
>
> # running() has been copied from debian's init script. we enhanced it a bit
> # @TODO rb 2009-06-23 maybe try "postqueue -p 2>&1 | head -n1 | grep 'Mail
> system is down' && false
> # @TODO rb 2009-06-23 maybe try "$binary $OPTIONS status" instead?
> running() {
> pid_dir=`postconf $OPTION_CONFIG_DIR -h process_id_directory 2>/dev/null`
> pidfile="${queue}/${pid_dir}/master.pid"
> queue=`postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null`
> [ -z $queue ] && false # check if queue directory is empty @TODO shall we
> return false or $OCF_ERR_something
>
> if [ -f "${pidfile}" ]; then
> # @TODO Could the master process become zombie?
> pid=`cat ${pidfile}`
> if isRunning $pid; then
> # @TODO why does "true" not work here?
> #true
> return $OCF_SUCCESS
> fi
> fi
>
> # try some different methods to see if we can find a running
> postfix/master instance
> # postfix status
> $binary $OPTION_CONFIG_DIR status && return $OCF_SUCCESS
>
> # what does postqueue say?
> echo postqueue $OPTION_CONFIG_DIR -p 2>&1
> postqueue $OPTION_CONFIG_DIR -p 2>&1 | head -n1 | grep 'Mail system is
> down' && false
>
> # is there a master process holding the spool directory?
> fuser -v $queue 2>&1 | grep -w master && return $OCF_SUCCESS
>
>
> # Postfix is not running
> false
> }
>
>
> postfix_status()
> {
> running
> }
>
> postfix_start()
> {
> # if Postfix is running return success
> if postfix_status; then
> ocf_log info "Postfix already running."
> return $OCF_SUCCESS
> fi
>
> # start Postfix
> $binary $OPTIONS start >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned error." $ret
> return $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
>
> postfix_stop()
> {
> $binary $OPTIONS stop >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned an error while stopping." $ret
> return $OCF_ERR_GENERIC
> fi
>
> # grant some time for shutdown and recheck 5 times
> for i in 1 2 3 4 5; do
> if postfix_status; then
> sleep 1
> fi
> done
>
> # escalate to abort if we did not stop by now
> # @TODO shall we loop here too?
> if postfix_status; then
> ocf_log err "Postfix failed to stop. Escalating to 'abort'"
>
> $binary $OPTIONS abort >/dev/null 2>&1; ret=$?
> sleep 5
> postfix_status && $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
> postfix_reload()
> {
> if postfix_status; then
> ocf_log info "Reloading Postfix."
> $binary $OPTIONS reload
> fi
> }
>
> postfix_monitor()
> {
> if postfix_status; then
> return $OCF_SUCCESS
> fi
>
> return $OCF_NOT_RUNNING
> }
>
> postfix_validate_all()
> {
> # check that the Postfix binary exists and can be executed
> if [ ! -x "$binary" ]; then
> ocf_log err "Postfix binary '$binary' does not exist or cannot be
> executed."
> return $OCF_ERR_GENERIC
> fi
>
> # check config_dir and alternate_config_directories parameter
> if [ "x$config_dir" != "x" ]; then
> if [ ! -d "$config_dir" ]; then
> ocf_log err "Postfix configuration directory '$config_dir' does
> not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> alternate_config_directories=`postconf -h
> alternate_config_directories 2>/dev/null | grep $config_dir`
> if [ "x$alternate_config_directories" = "x" ]; then
> ocf_log err "Postfix main configuration must contain correct
> 'alternate_config_directories' parameter."
> return $OCF_ERR_GENERIC
> fi
> fi
>
> # check spool/queue directory
> queue=`postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null`
> if [ ! -d "$queue" ]; then
> ocf_log err "Postfix spool/queue directory '$queue' does not exist."
> $ret
> return $OCF_ERR_GENERIC
> fi
>
> # run postfix internal check
> $binary $OPTIONS check >/dev/null 2>&1
> ret=$?
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix 'check' failed." $ret
> return $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
> #
> # Main
> #
>
> if [ $# -ne 1 ]; then
> usage
> exit $OCF_ERR_ARGS
> fi
>
> binary=$OCF_RESKEY_binary
> config_dir=$OCF_RESKEY_config_dir
> parameters=$OCF_RESKEY_parameters
>
> # debugging stuff
> #echo OCF_RESKEY_binary=$OCF_RESKEY_binary >>
> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_config_dir=$OCF_RESKEY_config_dir >>
> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_parameters=$OCF_RESKEY_parameters >>
> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
>
>
> # build postfix options string *outside* to access from each method
> OPTIONS=''
> OPTION_CONFIG_DIR=''
>
> # check if the Postfix config_dir exist
> if [ "x$config_dir" != "x" ]; then
> # save OPTION_CONFIG_DIR seperatly
> OPTION_CONFIG_DIR="-c $config_dir"
> OPTIONS=$OPTION_CONFIG_DIR
> fi
>
> if [ "x$parameters" != "x" ]; then
> OPTIONS="$OPTIONS $parameters"
> fi
>
> case $1 in
> meta-data) meta_data
> exit $OCF_SUCCESS
> ;;
>
> usage|help) usage
> exit $OCF_SUCCESS
> ;;
> esac
>
> postfix_validate_all
> ret=$?
>
> #echo "debug[$1:$ret]"
> LSB_STATUS_STOPPED=3
> if [ $ret -ne $OCF_SUCCESS ]; then
> case $1 in
> stop) exit $OCF_SUCCESS ;;
> monitor) exit $OCF_NOT_RUNNING;;
> status) exit $LSB_STATUS_STOPPED;;
> *) exit $ret;;
> esac
> fi
>
> case $1 in
> monitor) postfix_monitor
> exit $?
> ;;
> start) postfix_start
> exit $?
> ;;
>
> stop) postfix_stop
> exit $?
> ;;
>
> reload) postfix_reload
> exit $?
> ;;
>
> status) if postfix_status; then
> ocf_log info "Postfix is running."
> exit $OCF_SUCCESS
> else
> ocf_log info "Postfix is stopped."
> exit $OCF_NOT_RUNNING
> fi
> ;;
>
> monitor) postfix_monitor
> exit $?
> ;;
>
> validate-all) exit $OCF_SUCCESS
> ;;
>
> *) usage
> exit $OCF_ERR_UNIMPLEMENTED
> ;;
> esac
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/