Hi Dominik,
On Tue, Jun 17, 2008 at 11:28:17AM +0200, Dominik Klein wrote:
> Hi
>
> I wrote an RA, which can start virtually anything in a very basic meaning.
An interesting idea :)
> You can configure
> * binfile
> * cmdline_options
> * logfile
> * errlogfile
> * user
>
> Names should be self-explaining I guess.
>
> The RA starts the command configured with $binfile and $cmdline_options
> as $user and redirects stdout and stderr to appropriate files.
This may not be necessary. Whatever comes out on stdout/stderr
will be logged by lrmd.
> It stops the command with kill. If kill does not work, it uses kill -9.
I guess that you mean kill without options which translates to
kill -TERM.
> Monitors are done with ps. No deep check here but a pointer where to
> implement that if needed.
Perhaps implement a monitor script hook, such as the one in Xen.
That way one keeps the RA intact.
> If this is considered useful and you would like to include it into the
> project - please do. I'd like to contribute it under the GPL.
Great!
> This was written on Linux and as I lack experience with other *nixes,
> there may be places to improve for other operating systems.
>
> Let me know what you think and/or send patches if you want to improve
> the RA.
>
> Regards
> Dominik
>
> known limitations so far:
> It can not handle programs that display a different name in ps than you
> entered on the command line. For example, proftpd is started with
> something like "$binfile -c $configfile" but ps shows "proftpd
> (accepting connections)".
Why not use the pid file to check if the process is running?
Did you check start-stop-daemon? I'm not sure if we can use it,
since it's Linux specific, but there are certainly a few good
tips :)
> #!/bin/bash
Better to use #!/bin/sh. As of the next release, Debian will
distribute dash as the default shell. I think that ubuntu already
does that.
> #
> # OCF Resource Agent compliant resource script.
> #
> # Copyright (c) 2008 IN-telegence GmbH & Co. KG, Dominik Klein
> # All Rights Reserved.
> #
> # This program is free software; you can redistribute it and/or modify
> # it under the terms of version 2 of the GNU General Public License as
> # published by the Free Software Foundation.
> #
> # This program is distributed in the hope that it would be useful, but
> # WITHOUT ANY WARRANTY; without even the implied warranty of
> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> #
> # Further, this software is distributed without any warranty that it is
> # free of the rightful claim of any third person regarding infringement
> # or the like. Any license provided herein, whether implied or
> # otherwise, applies only to this software file. Patent licenses, if
> # any, provided herein do not apply to combinations of this program with
> # other software, or any other product whatsoever.
> #
> # You should have received a copy of the GNU General Public License
> # along with this program; if not, write the Free Software Foundation,
> # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
>
> # OCF instance parameters
> # OCF_RESKEY_binfile
> # OCF_RESKEY_cmdline_options
> # OCF_RESKEY_logfile
> # OCF_RESKEY_errlogfile
> # OCF_RESKEY_user
>
> # Initialization:
> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
>
> anything_status() {
> [ -n "$cmdline_options" ] && cmd="$binfile $cmdline_options" ||
> cmd="$binfile"
> if pgrep -u $user -f "$cmd" > /dev/null 2>&1
function number { # make sure that the file contains a number
grep '^[0-9][0-9]*$' $1
}
if test -f $PIDFILE && pid=`number $PIDFILE` && kill -0 $pid
> then
> return $OCF_RUNNING
> else
> return $OCF_NOT_RUNNING
> fi
> }
>
> anything_start() {
> if ! anything_status
> then
> if [ -n "$logfile" -a -n "$errlogfile" ]
> then
> # We have logfile and errlogfile, so redirect STDOUT
> und STDERR to different files
> cmd="su - $user -c \"nohup $binfile $cmdline_options >>
> $logfile 2>> $errlogfile &\""
> else if [ -n "$logfile" ]
> then
> # We only have logfile so redirect STDOUT and
> STDERR to the same file
> cmd="su - $user -c \"nohup $binfile
> $cmdline_options >> $logfile 2>&1 &\""
> else
> # We have neither logfile nor errlogfile, so
> redirect STDOUT and STDERR to a generic logfile
> cmd="su - $user -c \"nohup $binfile
> $cmdline_options >> /var/log/$(basename $binfile)\_$(date +%Y.%m.%d) 2>&1 &\""
As I said above, I'd leave at least this part out. Perhaps also
all the logfile business.
> fi
> fi
> ocf_log debug "Starting $process: $cmd"
> # Execute the command as created above
> eval $cmd
> if anything_status
> then
> ocf_log debug "$process: $cmd started successfully"
> return $OCF_SUCCESS
> else
> ocf_log err "$process: $cmd could not be started"
> return $OCF_ERR_GENERIC
> fi
> else
> # If already running, consider start successful
> ocf_log debug "$process: $cmd is already running"
> return $OCF_SUCCESS
> fi
> }
>
> anything_stop() {
> if anything_status
> then
> tries=5
> i=0
> while [ $i -lt $tries ]
> do
> # there may be programs without command line options
> [ -n "$cmdline_options" ] && cmd="$binfile
> $cmdline_options" || cmd="$binfile"
> pkill -u $user -f "$cmd"
It should be enough to send the signal once. So, pkill should be
moved out of the loop.
> sleep 1
> if ! anything_status
> then
> return $OCF_SUCCESS
> fi
> let "i++"
> done
It is arguably wrong to limit the time to stop. One should let
the user do that by specifying the operation timeout. OTOH, I
believe that most resource agents are doing this in a similar
way, which still doesn't make it right. Then again, the agent
should try 'kill -9' eventually. To make this perfect, one would
need to define two timeouts: one for regular stop and one for
oh-yes-you'll-be-stopped-no-matter-what.
> # one last attempt with sigkill
> ocf_log warn "Stop $process: Looks like $process could not be
> stopped by SIGTERM, now sending SIGKILL"
> ocf_log warn "$(pgrep -u $user -f \"$cmd\" -l)"
> pkill -u $user -9 -f "$cmd"
> if ! anything_status
> then
> ocf_log debug "Stop $process: Seems like SIGKILL did
> the job"
> ocf_log debug "$(pgrep -u $user -f \"$cmd\" -l)"
> return $OCF_SUCCESS
> else
> ocf_log err "Stop $process: failed"
> return $OCF_ERR_GENERIC
> fi
> else
> # was not running, so stop can be considered successful
> return $OCF_SUCCESS
> fi
> return $OCF_ERR_GENERIC
> }
>
> anything_monitor() {
> anything_status
> ret=$?
> if [ $ret -eq $OCF_SUCCESS ]
> then
> # implement your deeper monitor operation here
if [ -n "$OCF_RESKEY_monitor_hook" ]; then
eval "$OCF_RESKEY_monitor_hook"
else
true
fi
> return $OCF_SUCCESS
return # $? is implied
> [skipped the rest]
Cheers,
Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems