Good morning Dejan
thanks for your reply.
I wrote an RA, which can start virtually anything in a very basic meaning.
An interesting idea :)
I wrote it because I have a LOT of custom programs to make HA and I
needed something that I can throw in a binfile and some command line
parameters.
After I had it the way it worked for what I need, I just thought it
might be useful for others.
The RA starts the command configured with $binfile and $cmdline_options
as $user and redirects stdout and stderr to appropriate files.
This may not be necessary. Whatever comes out on stdout/stderr
will be logged by lrmd.
It is necessary in my case.
How about not redirecting anything if neither logfile nor errlogfile is set?
It stops the command with kill. If kill does not work, it uses kill -9.
I guess that you mean kill without options which translates to
kill -TERM.
Right.
Monitors are done with ps. No deep check here but a pointer where to
implement that if needed.
Perhaps implement a monitor script hook, such as the one in Xen.
That way one keeps the RA intact.
I will look at that and see how that works.
Why not use the pid file to check if the process is running?
Did you check start-stop-daemon? I'm not sure if we can use it,
since it's Linux specific, but there are certainly a few good
tips :)
Afaik, not all programs write PID files. Ie none of my custom programs
does. If there's a way to generate one after starting $binfile - let me
know.
#!/bin/bash
Better to use #!/bin/sh. As of the next release, Debian will
distribute dash as the default shell. I think that ubuntu already
does that.
K.
#
# OCF Resource Agent compliant resource script.
#
# Copyright (c) 2008 IN-telegence GmbH & Co. KG, Dominik Klein
# All Rights Reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like. Any license provided herein, whether implied or
# otherwise, applies only to this software file. Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
# OCF instance parameters
# OCF_RESKEY_binfile
# OCF_RESKEY_cmdline_options
# OCF_RESKEY_logfile
# OCF_RESKEY_errlogfile
# OCF_RESKEY_user
# Initialization:
. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
anything_status() {
[ -n "$cmdline_options" ] && cmd="$binfile $cmdline_options" ||
cmd="$binfile"
if pgrep -u $user -f "$cmd" > /dev/null 2>&1
function number { # make sure that the file contains a number
grep '^[0-9][0-9]*$' $1
}
if test -f $PIDFILE && pid=`number $PIDFILE` && kill -0 $pid
where/how do you get $PIDFILE?
And it should be quoted I guess.
then
return $OCF_RUNNING
else
return $OCF_NOT_RUNNING
fi
}
anything_start() {
if ! anything_status
then
if [ -n "$logfile" -a -n "$errlogfile" ]
then
# We have logfile and errlogfile, so redirect STDOUT
und STDERR to different files
cmd="su - $user -c \"nohup $binfile $cmdline_options >> $logfile 2>>
$errlogfile &\""
else if [ -n "$logfile" ]
then
# We only have logfile so redirect STDOUT and
STDERR to the same file
cmd="su - $user -c \"nohup $binfile $cmdline_options >> $logfile
2>&1 &\""
else
# We have neither logfile nor errlogfile, so
redirect STDOUT and STDERR to a generic logfile
cmd="su - $user -c \"nohup $binfile $cmdline_options >>
/var/log/$(basename $binfile)\_$(date +%Y.%m.%d) 2>&1 &\""
As I said above, I'd leave at least this part out. Perhaps also
all the logfile business.
See my suggestion above. I think that the most flexible way.
fi
fi
ocf_log debug "Starting $process: $cmd"
# Execute the command as created above
eval $cmd
if anything_status
then
ocf_log debug "$process: $cmd started successfully"
return $OCF_SUCCESS
else
ocf_log err "$process: $cmd could not be started"
return $OCF_ERR_GENERIC
fi
else
# If already running, consider start successful
ocf_log debug "$process: $cmd is already running"
return $OCF_SUCCESS
fi
}
anything_stop() {
if anything_status
then
tries=5
i=0
while [ $i -lt $tries ]
do
# there may be programs without command line options
[ -n "$cmdline_options" ] && cmd="$binfile $cmdline_options" ||
cmd="$binfile"
pkill -u $user -f "$cmd"
It should be enough to send the signal once. So, pkill should be
moved out of the loop.
Right.
sleep 1
if ! anything_status
then
return $OCF_SUCCESS
fi
let "i++"
done
It is arguably wrong to limit the time to stop. One should let
the user do that by specifying the operation timeout. OTOH, I
believe that most resource agents are doing this in a similar
way, which still doesn't make it right. Then again, the agent
should try 'kill -9' eventually. To make this perfect, one would
need to define two timeouts: one for regular stop and one for
oh-yes-you'll-be-stopped-no-matter-what.
So how about adding "tries" with a more intuitive name as a
configuration option. It could default to 10000 or something and be
overwritten if it is set. While unset, kill -9 would never be used. If
set, people need to set it lower than the stop timeout to make it work.
I guess from inside the RA, the timeout is not visible, so it should be
made clean in the header of the script and in the meta-data.
# one last attempt with sigkill
ocf_log warn "Stop $process: Looks like $process could not be
stopped by SIGTERM, now sending SIGKILL"
ocf_log warn "$(pgrep -u $user -f \"$cmd\" -l)"
pkill -u $user -9 -f "$cmd"
if ! anything_status
then
ocf_log debug "Stop $process: Seems like SIGKILL did the
job"
ocf_log debug "$(pgrep -u $user -f \"$cmd\" -l)"
return $OCF_SUCCESS
else
ocf_log err "Stop $process: failed"
return $OCF_ERR_GENERIC
fi
else
# was not running, so stop can be considered successful
return $OCF_SUCCESS
fi
return $OCF_ERR_GENERIC
}
anything_monitor() {
anything_status
ret=$?
if [ $ret -eq $OCF_SUCCESS ]
then
# implement your deeper monitor operation here
if [ -n "$OCF_RESKEY_monitor_hook" ]; then
eval "$OCF_RESKEY_monitor_hook"
else
true
fi
Oh, you already did what I wanted to look up. Thanks :)
return $OCF_SUCCESS
return # $? is implied
See, learned something - again :)
Regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems