Hi Dominik,

On Tue, Jun 17, 2008 at 11:28:17AM +0200, Dominik Klein wrote:
> Hi
>
> I wrote an RA, which can start virtually anything in a very basic meaning.

An interesting idea :)

> You can configure
> * binfile
> * cmdline_options
> * logfile
> * errlogfile
> * user
>
> Names should be self-explaining I guess.
>
> The RA starts the command configured with $binfile and $cmdline_options
> as $user and redirects stdout and stderr to appropriate files.

This may not be necessary. Whatever comes out on stdout/stderr
will be logged by lrmd.

> It stops the command with kill. If kill does not work, it uses kill -9.

I guess that you mean kill without options which translates to
kill -TERM.

> Monitors are done with ps. No deep check here but a pointer where to
> implement that if needed.

Perhaps implement a monitor script hook, such as the one in Xen.
That way one keeps the RA intact.

> If this is considered useful and you would like to include it into the
> project - please do. I'd like to contribute it under the GPL.

Great!

> This was written on Linux and as I lack experience with other *nixes,
> there may be places to improve for other operating systems.
>
> Let me know what you think and/or send patches if you want to improve
> the RA.
>
> Regards
> Dominik
>
> known limitations so far:
> It can not handle programs that display a different name in ps than you
> entered on the command line. For example, proftpd is started with
> something like "$binfile -c $configfile" but ps shows "proftpd
> (accepting connections)".

Why not use the pid file to check if the process is running?
Did you check start-stop-daemon? I'm not sure if we can use it,
since it's Linux specific, but there are certainly a few good
tips :)

> #!/bin/bash

Better to use #!/bin/sh. As of the next release, Debian will
distribute dash as the default shell. I think that ubuntu already
does that.

> #
> #       OCF Resource Agent compliant resource script.
> #
> # Copyright (c) 2008 IN-telegence GmbH & Co. KG, Dominik Klein
> #                    All Rights Reserved.
> #
> # This program is free software; you can redistribute it and/or modify
> # it under the terms of version 2 of the GNU General Public License as
> # published by the Free Software Foundation.
> #
> # This program is distributed in the hope that it would be useful, but
> # WITHOUT ANY WARRANTY; without even the implied warranty of
> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> #
> # Further, this software is distributed without any warranty that it is
> # free of the rightful claim of any third person regarding infringement
> # or the like.  Any license provided herein, whether implied or
> # otherwise, applies only to this software file.  Patent licenses, if
> # any, provided herein do not apply to combinations of this program with
> # other software, or any other product whatsoever.
> #
> # You should have received a copy of the GNU General Public License
> # along with this program; if not, write the Free Software Foundation,
> # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
> 
> # OCF instance parameters
> #       OCF_RESKEY_binfile
> #       OCF_RESKEY_cmdline_options
> #       OCF_RESKEY_logfile
> #       OCF_RESKEY_errlogfile
> #       OCF_RESKEY_user
> 
> # Initialization:
> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
> 
> anything_status() {
>       [ -n "$cmdline_options" ] && cmd="$binfile $cmdline_options" || 
> cmd="$binfile"
>       if pgrep -u $user -f "$cmd" > /dev/null 2>&1

function number { # make sure that the file contains a number
        grep '^[0-9][0-9]*$' $1
}

if test -f $PIDFILE && pid=`number $PIDFILE` && kill -0 $pid
>       then
>               return $OCF_RUNNING
>       else
>               return $OCF_NOT_RUNNING
>       fi
> }
> 
> anything_start() {
>       if ! anything_status
>       then
>               if [ -n "$logfile" -a -n "$errlogfile" ]
>               then
>                       # We have logfile and errlogfile, so redirect STDOUT 
> und STDERR to different files
>                       cmd="su - $user -c \"nohup $binfile $cmdline_options >> 
> $logfile 2>> $errlogfile &\""
>               else if [ -n "$logfile" ]
>                       then
>                               # We only have logfile so redirect STDOUT and 
> STDERR to the same file
>                               cmd="su - $user -c \"nohup $binfile 
> $cmdline_options >> $logfile 2>&1 &\""
>                       else
>                               # We have neither logfile nor errlogfile, so 
> redirect STDOUT and STDERR to a generic logfile
>                               cmd="su - $user -c \"nohup $binfile 
> $cmdline_options >> /var/log/$(basename $binfile)\_$(date +%Y.%m.%d) 2>&1 &\""

As I said above, I'd leave at least this part out. Perhaps also
all the logfile business.

>                       fi
>               fi
>               ocf_log debug "Starting $process: $cmd"
>               # Execute the command as created above
>               eval $cmd
>               if anything_status
>               then
>                       ocf_log debug "$process: $cmd started successfully"
>                       return $OCF_SUCCESS
>               else 
>                       ocf_log err "$process: $cmd could not be started"
>                       return $OCF_ERR_GENERIC
>               fi
>       else
>               # If already running, consider start successful
>               ocf_log debug "$process: $cmd is already running"
>               return $OCF_SUCCESS
>       fi
> }
> 
> anything_stop() {
>       if anything_status
>       then
>               tries=5
>               i=0
>               while [ $i -lt $tries ]
>               do
>                       # there may be programs without command line options
>                       [ -n "$cmdline_options" ] && cmd="$binfile 
> $cmdline_options" || cmd="$binfile"
>                       pkill -u $user -f "$cmd"

It should be enough to send the signal once. So, pkill should be
moved out of the loop.

>                       sleep 1
>                       if ! anything_status
>                       then
>                               return $OCF_SUCCESS
>                       fi
>                       let "i++"
>               done

It is arguably wrong to limit the time to stop. One should let
the user do that by specifying the operation timeout. OTOH, I
believe that most resource agents are doing this in a similar
way, which still doesn't make it right. Then again, the agent
should try 'kill -9' eventually. To make this perfect, one would
need to define two timeouts: one for regular stop and one for
oh-yes-you'll-be-stopped-no-matter-what.

>               # one last attempt with sigkill
>               ocf_log warn "Stop $process: Looks like $process could not be 
> stopped by SIGTERM, now sending SIGKILL"
>               ocf_log warn "$(pgrep -u $user -f \"$cmd\" -l)"
>               pkill -u $user -9 -f "$cmd"
>               if ! anything_status
>               then
>                       ocf_log debug "Stop $process: Seems like SIGKILL did 
> the job"
>                       ocf_log debug "$(pgrep -u $user -f \"$cmd\" -l)"
>                       return $OCF_SUCCESS
>               else
>                       ocf_log err "Stop $process: failed"
>                       return $OCF_ERR_GENERIC
>               fi
>       else
>               # was not running, so stop can be considered successful
>               return $OCF_SUCCESS
>       fi
>       return $OCF_ERR_GENERIC
> }
> 
> anything_monitor() {
>       anything_status
>       ret=$?
>       if [ $ret -eq $OCF_SUCCESS ]
>       then
>               # implement your deeper monitor operation here

if [ -n "$OCF_RESKEY_monitor_hook" ]; then
        eval "$OCF_RESKEY_monitor_hook"
else
        true
fi

>               return $OCF_SUCCESS

return          # $? is implied

> [skipped the rest]

Cheers,

Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to