Good morning Dejan

thanks for your reply.

I wrote an RA, which can start virtually anything in a very basic meaning.

An interesting idea :)

I wrote it because I have a LOT of custom programs to make HA and I needed something that I can throw in a binfile and some command line parameters. After I had it the way it worked for what I need, I just thought it might be useful for others.

The RA starts the command configured with $binfile and $cmdline_options
as $user and redirects stdout and stderr to appropriate files.

This may not be necessary. Whatever comes out on stdout/stderr
will be logged by lrmd.

It is necessary in my case.

How about not redirecting anything if neither logfile nor errlogfile is set?

It stops the command with kill. If kill does not work, it uses kill -9.

I guess that you mean kill without options which translates to
kill -TERM.

Right.

Monitors are done with ps. No deep check here but a pointer where to
implement that if needed.

Perhaps implement a monitor script hook, such as the one in Xen.
That way one keeps the RA intact.

I will look at that and see how that works.

Why not use the pid file to check if the process is running?
Did you check start-stop-daemon? I'm not sure if we can use it,
since it's Linux specific, but there are certainly a few good
tips :)

Afaik, not all programs write PID files. Ie none of my custom programs does. If there's a way to generate one after starting $binfile - let me know.

#!/bin/bash

Better to use #!/bin/sh. As of the next release, Debian will
distribute dash as the default shell. I think that ubuntu already
does that.

K.

#
#       OCF Resource Agent compliant resource script.
#
# Copyright (c) 2008 IN-telegence GmbH & Co. KG, Dominik Klein
#                    All Rights Reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.

# OCF instance parameters
#       OCF_RESKEY_binfile
#       OCF_RESKEY_cmdline_options
#       OCF_RESKEY_logfile
#       OCF_RESKEY_errlogfile
#       OCF_RESKEY_user

# Initialization:
. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs

anything_status() {
        [ -n "$cmdline_options" ] && cmd="$binfile $cmdline_options" || 
cmd="$binfile"
        if pgrep -u $user -f "$cmd" > /dev/null 2>&1

function number { # make sure that the file contains a number
        grep '^[0-9][0-9]*$' $1
}

if test -f $PIDFILE && pid=`number $PIDFILE` && kill -0 $pid

where/how do you get $PIDFILE?

And it should be quoted I guess.

        then
                return $OCF_RUNNING
        else
                return $OCF_NOT_RUNNING
        fi
}

anything_start() {
        if ! anything_status
        then
                if [ -n "$logfile" -a -n "$errlogfile" ]
                then
                        # We have logfile and errlogfile, so redirect STDOUT 
und STDERR to different files
                        cmd="su - $user -c \"nohup $binfile $cmdline_options >> $logfile 2>> 
$errlogfile &\""
                else if [ -n "$logfile" ]
                        then
                                # We only have logfile so redirect STDOUT and 
STDERR to the same file
                                cmd="su - $user -c \"nohup $binfile $cmdline_options >> $logfile 
2>&1 &\""
                        else
                                # We have neither logfile nor errlogfile, so 
redirect STDOUT and STDERR to a generic logfile
                                cmd="su - $user -c \"nohup $binfile $cmdline_options >> 
/var/log/$(basename $binfile)\_$(date +%Y.%m.%d) 2>&1 &\""

As I said above, I'd leave at least this part out. Perhaps also
all the logfile business.

See my suggestion above. I think that the most flexible way.

                        fi
                fi
                ocf_log debug "Starting $process: $cmd"
                # Execute the command as created above
                eval $cmd
                if anything_status
                then
                        ocf_log debug "$process: $cmd started successfully"
                        return $OCF_SUCCESS
else ocf_log err "$process: $cmd could not be started"
                        return $OCF_ERR_GENERIC
                fi
        else
                # If already running, consider start successful
                ocf_log debug "$process: $cmd is already running"
                return $OCF_SUCCESS
        fi
}

anything_stop() {
        if anything_status
        then
                tries=5
                i=0
                while [ $i -lt $tries ]
                do
                        # there may be programs without command line options
                        [ -n "$cmdline_options" ] && cmd="$binfile $cmdline_options" || 
cmd="$binfile"
                        pkill -u $user -f "$cmd"

It should be enough to send the signal once. So, pkill should be
moved out of the loop.

Right.

                        sleep 1
                        if ! anything_status
                        then
                                return $OCF_SUCCESS
                        fi
                        let "i++"
                done

It is arguably wrong to limit the time to stop. One should let
the user do that by specifying the operation timeout. OTOH, I
believe that most resource agents are doing this in a similar
way, which still doesn't make it right. Then again, the agent
should try 'kill -9' eventually. To make this perfect, one would
need to define two timeouts: one for regular stop and one for
oh-yes-you'll-be-stopped-no-matter-what.

So how about adding "tries" with a more intuitive name as a configuration option. It could default to 10000 or something and be overwritten if it is set. While unset, kill -9 would never be used. If set, people need to set it lower than the stop timeout to make it work. I guess from inside the RA, the timeout is not visible, so it should be made clean in the header of the script and in the meta-data.

                # one last attempt with sigkill
                ocf_log warn "Stop $process: Looks like $process could not be 
stopped by SIGTERM, now sending SIGKILL"
                ocf_log warn "$(pgrep -u $user -f \"$cmd\" -l)"
                pkill -u $user -9 -f "$cmd"
                if ! anything_status
                then
                        ocf_log debug "Stop $process: Seems like SIGKILL did the 
job"
                        ocf_log debug "$(pgrep -u $user -f \"$cmd\" -l)"
                        return $OCF_SUCCESS
                else
                        ocf_log err "Stop $process: failed"
                        return $OCF_ERR_GENERIC
                fi
        else
                # was not running, so stop can be considered successful
                return $OCF_SUCCESS
        fi
        return $OCF_ERR_GENERIC
}

anything_monitor() {
        anything_status
        ret=$?
        if [ $ret -eq $OCF_SUCCESS ]
        then
                # implement your deeper monitor operation here

if [ -n "$OCF_RESKEY_monitor_hook" ]; then
        eval "$OCF_RESKEY_monitor_hook"
else
        true
fi

Oh, you already did what I wanted to look up. Thanks :)

                return $OCF_SUCCESS

return          # $? is implied

See, learned something - again :)

Regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to