Hi,

On Wed, Jun 18, 2008 at 08:55:23AM +0200, Dominik Klein wrote:
> Good morning Dejan
>
> thanks for your reply.
>
>>> I wrote an RA, which can start virtually anything in a very basic 
>>> meaning.
>> An interesting idea :)
>
> I wrote it because I have a LOT of custom programs to make HA and I needed 
> something that I can throw in a binfile and some command line parameters.
> After I had it the way it worked for what I need, I just thought it might 
> be useful for others.
>
>>> The RA starts the command configured with $binfile and $cmdline_options
>>> as $user and redirects stdout and stderr to appropriate files.
>> This may not be necessary. Whatever comes out on stdout/stderr
>> will be logged by lrmd.
>
> It is necessary in my case.
>
> How about not redirecting anything if neither logfile nor errlogfile is 
> set?

OK.

>>> It stops the command with kill. If kill does not work, it uses kill -9.
>> I guess that you mean kill without options which translates to
>> kill -TERM.
>
> Right.
>
>>> Monitors are done with ps. No deep check here but a pointer where to
>>> implement that if needed.
>> Perhaps implement a monitor script hook, such as the one in Xen.
>> That way one keeps the RA intact.
>
> I will look at that and see how that works.
>
>> Why not use the pid file to check if the process is running?
>> Did you check start-stop-daemon? I'm not sure if we can use it,
>> since it's Linux specific, but there are certainly a few good
>> tips :)
>
> Afaik, not all programs write PID files. Ie none of my custom programs 
> does. If there's a way to generate one after starting $binfile - let me 
> know.

Right. start-stop-daemon create a PID file themselves. When you
start a program in the background (with nohup ... &) $! contains
the pid of the process.

>>> #!/bin/bash
>> Better to use #!/bin/sh. As of the next release, Debian will
>> distribute dash as the default shell. I think that ubuntu already
>> does that.
>
> K.
>
>>> #
>>> #       OCF Resource Agent compliant resource script.
>>> #
>>> # Copyright (c) 2008 IN-telegence GmbH & Co. KG, Dominik Klein
>>> #                    All Rights Reserved.
>>> #
>>> # This program is free software; you can redistribute it and/or modify
>>> # it under the terms of version 2 of the GNU General Public License as
>>> # published by the Free Software Foundation.
>>> #
>>> # This program is distributed in the hope that it would be useful, but
>>> # WITHOUT ANY WARRANTY; without even the implied warranty of
>>> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>> #
>>> # Further, this software is distributed without any warranty that it is
>>> # free of the rightful claim of any third person regarding infringement
>>> # or the like.  Any license provided herein, whether implied or
>>> # otherwise, applies only to this software file.  Patent licenses, if
>>> # any, provided herein do not apply to combinations of this program with
>>> # other software, or any other product whatsoever.
>>> #
>>> # You should have received a copy of the GNU General Public License
>>> # along with this program; if not, write the Free Software Foundation,
>>> # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
>>>
>>> # OCF instance parameters
>>> #       OCF_RESKEY_binfile
>>> #       OCF_RESKEY_cmdline_options
>>> #       OCF_RESKEY_logfile
>>> #       OCF_RESKEY_errlogfile
>>> #       OCF_RESKEY_user
>>>
>>> # Initialization:
>>> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
>>>
>>> anything_status() {
>>>     [ -n "$cmdline_options" ] && cmd="$binfile $cmdline_options" || 
>>> cmd="$binfile"
>>>     if pgrep -u $user -f "$cmd" > /dev/null 2>&1
>> function number { # make sure that the file contains a number
>>      grep '^[0-9][0-9]*$' $1
>> }
>> if test -f $PIDFILE && pid=`number $PIDFILE` && kill -0 $pid
>
> where/how do you get $PIDFILE?
>
> And it should be quoted I guess.

PIDFILE could be supplied in the CIB. Or use some scheme with the
resource id. There's a variable HA_RSCTMP containing directory
where such files may be stored.

>>>     then
>>>             return $OCF_RUNNING
>>>     else
>>>             return $OCF_NOT_RUNNING
>>>     fi
>>> }
>>>
>>> anything_start() {
>>>     if ! anything_status
>>>     then
>>>             if [ -n "$logfile" -a -n "$errlogfile" ]
>>>             then
>>>                     # We have logfile and errlogfile, so redirect STDOUT 
>>> und STDERR to 
>>> different files
>>>                     cmd="su - $user -c \"nohup $binfile $cmdline_options >> 
>>> $logfile 2>> 
>>> $errlogfile &\""
>>>             else if [ -n "$logfile" ]
>>>                     then
>>>                             # We only have logfile so redirect STDOUT and 
>>> STDERR to the same file
>>>                             cmd="su - $user -c \"nohup $binfile 
>>> $cmdline_options >> $logfile 2>&1 
>>> &\""
>>>                     else
>>>                             # We have neither logfile nor errlogfile, so 
>>> redirect STDOUT and 
>>> STDERR to a generic logfile
>>>                             cmd="su - $user -c \"nohup $binfile 
>>> $cmdline_options >> 
>>> /var/log/$(basename $binfile)\_$(date +%Y.%m.%d) 2>&1 &\""
>> As I said above, I'd leave at least this part out. Perhaps also
>> all the logfile business.
>
> See my suggestion above. I think that the most flexible way.
>
>>>                     fi
>>>             fi
>>>             ocf_log debug "Starting $process: $cmd"
>>>             # Execute the command as created above
>>>             eval $cmd
>>>             if anything_status
>>>             then
>>>                     ocf_log debug "$process: $cmd started successfully"
>>>                     return $OCF_SUCCESS
>>>             else                    ocf_log err "$process: $cmd could not 
>>> be started"
>>>                     return $OCF_ERR_GENERIC
>>>             fi
>>>     else
>>>             # If already running, consider start successful
>>>             ocf_log debug "$process: $cmd is already running"
>>>             return $OCF_SUCCESS
>>>     fi
>>> }
>>>
>>> anything_stop() {
>>>     if anything_status
>>>     then
>>>             tries=5
>>>             i=0
>>>             while [ $i -lt $tries ]
>>>             do
>>>                     # there may be programs without command line options
>>>                     [ -n "$cmdline_options" ] && cmd="$binfile 
>>> $cmdline_options" || 
>>> cmd="$binfile"
>>>                     pkill -u $user -f "$cmd"
>> It should be enough to send the signal once. So, pkill should be
>> moved out of the loop.
>
> Right.
>
>>>                     sleep 1
>>>                     if ! anything_status
>>>                     then
>>>                             return $OCF_SUCCESS
>>>                     fi
>>>                     let "i++"
>>>             done
>> It is arguably wrong to limit the time to stop. One should let
>> the user do that by specifying the operation timeout. OTOH, I
>> believe that most resource agents are doing this in a similar
>> way, which still doesn't make it right. Then again, the agent
>> should try 'kill -9' eventually. To make this perfect, one would
>> need to define two timeouts: one for regular stop and one for
>> oh-yes-you'll-be-stopped-no-matter-what.
>
> So how about adding "tries" with a more intuitive name as a configuration 
> option. It could default to 10000 or something and be overwritten if it is 
> set. While unset, kill -9 would never be used. If set, people need to set 
> it lower than the stop timeout to make it work. I guess from inside the RA, 
> the timeout is not visible, so it should be made clean in the header of the 
> script and in the meta-data.

Yes, that could work.

>>>             # one last attempt with sigkill
>>>             ocf_log warn "Stop $process: Looks like $process could not be 
>>> stopped 
>>> by SIGTERM, now sending SIGKILL"
>>>             ocf_log warn "$(pgrep -u $user -f \"$cmd\" -l)"
>>>             pkill -u $user -9 -f "$cmd"
>>>             if ! anything_status
>>>             then
>>>                     ocf_log debug "Stop $process: Seems like SIGKILL did 
>>> the job"
>>>                     ocf_log debug "$(pgrep -u $user -f \"$cmd\" -l)"
>>>                     return $OCF_SUCCESS
>>>             else
>>>                     ocf_log err "Stop $process: failed"
>>>                     return $OCF_ERR_GENERIC
>>>             fi
>>>     else
>>>             # was not running, so stop can be considered successful
>>>             return $OCF_SUCCESS
>>>     fi
>>>     return $OCF_ERR_GENERIC
>>> }
>>>
>>> anything_monitor() {
>>>     anything_status
>>>     ret=$?
>>>     if [ $ret -eq $OCF_SUCCESS ]
>>>     then
>>>             # implement your deeper monitor operation here
>> if [ -n "$OCF_RESKEY_monitor_hook" ]; then
>>      eval "$OCF_RESKEY_monitor_hook"
>> else
>>      true
>> fi
>
> Oh, you already did what I wanted to look up. Thanks :)
>
>>>             return $OCF_SUCCESS
>> return               # $? is implied
>
> See, learned something - again :)

:) I edited it because it should return the exit code of the
monitor_hook script. Otherwise, return $OCF_SUCCESS would be OK.

Cheers,

Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to