On 3/10/06, Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote:
> On 2006-03-09T05:55:11, [email protected] wrote:
>
> > linux-ha CVS committal
> >
> > Author : xunsun
> > Host :
> > Project : linux-ha
> > Module : resources
> >
> > Dir : linux-ha/resources/OCF
> >
> >
> > Modified Files:
> > AudibleAlarm.in ClusterMon.in Delay.in Dummy.in Filesystem.in
> > ICP.in IPaddr.in IPaddr2.in IPsrcaddr.in IPv6addr.c LVM.in
> > LinuxSCSI.in MailTo.in Raid1.in ServeRAID.in WAS.in
> > WinPopup.in Xinetd.in apache.in db2.in drbd.in
> > ocf-shellfuncs.in portblock.in
> >
> >
> > Log Message:
> > *moved run() function to ocf-shellfuncs
> > *added logic to support RA states OCF_RUNNING_MASTER and OCF_FAILED_MASTER
>
> run() maybe, but "return_master" looks extremly broken and the useage of
> it extremly weird.
I admit 'return_master' is broken after reading your mail :( And it
should not exist.
>
> How have these changes been tested?
>
> Some highlights, just flagged once for each issue, they tend to occur
> repeatedly:
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/ClusterMon.in,v
> > retrieving revision 1.8
> > retrieving revision 1.9
> > diff -u -3 -r1.8 -r1.9
> > --- ClusterMon.in 2 Feb 2006 14:21:08 -0000 1.8
> > +++ ClusterMon.in 9 Mar 2006 12:55:10 -0000 1.9
> > @@ -1,6 +1,6 @@
> > #!/bin/sh
> > #
> > -# $Id: ClusterMon.in,v 1.8 2006/02/02 14:21:08 andrew Exp $
> > +# $Id: ClusterMon.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> > #
> > # ClusterMon OCF RA. Does nothing but wait a few seconds, can be
> > # configured to fail occassionally.
> > @@ -159,27 +159,12 @@
> > if [ ! -z $pid ]; then
> > kill -0 $pid
> > if [ $? = 0 ]; then
> > - exit $OCF_SUCCESS
> > + return `return_master $OCF_SUCCESS`
>
> uh? return_master doesn't output anything. Why the ``?
>
> Besides,
> ClusterMon IS NOT A MULTISTATE CAPABLE AGENT! Why it should EVER return
> a master state is beyond me.
If it's not a multi-state resource, and crm doesn't try to probe it,
it will return what it is expected to do. I admit here that I don't
completely know the requirement of a RA to be multi-state capable (is
it documented somewhere?).
I did this because I wanted these scripts to behave similarly. I will
undo this change, sorry for the trouble.
>
> > # Check the update interval
> > - if CheckInterval "$OCF_RESKEY_update"; then
> > + if ocf_is_decimal "$OCF_RESKEY_update" && [ $OCF_RESKEY_update -gt 0
> > ]; then
> > :
> > else
> > ocf_log err "Invalid update interval $OCF_RESKEY_update. It should be
> > positive integer!"
> > @@ -278,4 +263,4 @@
> > ;;
> > esac
> >
> > -exit $OCF_SUCCESS
> > +exit $?
>
> Pointless change; this is actually never reached, and if it is, it
> certainly is an error, if you want to change it to something else ;-)
This is reachable as a result of the (bad) change in ClusterMon_monitor.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Delay.in,v
> > retrieving revision 1.9
> > retrieving revision 1.10
> > diff -u -3 -r1.9 -r1.10
> > --- Delay.in 11 Feb 2006 14:21:20 -0000 1.9
> > +++ Delay.in 9 Mar 2006 12:55:10 -0000 1.10
> > @@ -102,7 +102,7 @@
> > Delay_stat
> > then
> > ocf_log info "Delay is running OK"
> > - return $OCF_SUCCESS
> > + return_master $OCF_SUCCESS
> > else
> > ocf_log info "Delay is stopped"
> > return $OCF_NOT_RUNNING
>
> Not multi-state capable.
OK, I will undo this.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Dummy.in,v
> > retrieving revision 1.2
> > retrieving revision 1.3
> > diff -u -3 -r1.2 -r1.3
> > --- Dummy.in 3 Jun 2005 03:27:31 -0000 1.2
> > +++ Dummy.in 9 Mar 2006 12:55:10 -0000 1.3
> > @@ -1,6 +1,6 @@
> > #!/bin/sh
> > #
> > -# $Id: Dummy.in,v 1.2 2005/06/03 03:27:31 sunjd Exp $
> > +# $Id: Dummy.in,v 1.3 2006/03/09 12:55:10 xunsun Exp $
> > #
> > # Dummy OCF RA. Does nothing but wait a few seconds, can be
> > # configured to fail occassionally.
> > @@ -114,6 +114,7 @@
> >
> > dummy_monitor() {
> > sleep $OCF_RESKEY_monitor_delay
> > + return_monitor $?
> > }
>
> This very clearly has never been tested. I leave it to the reader to
> figure out why this is so clearly obvious.
Yes, sorry. Will undo it.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Filesystem.in,v
> > retrieving revision 1.15
> > retrieving revision 1.16
> > diff -u -3 -r1.15 -r1.16
> > --- Filesystem.in 11 Feb 2006 13:40:42 -0000 1.15
> > +++ Filesystem.in 9 Mar 2006 12:55:10 -0000 1.16
> > @@ -1,6 +1,6 @@
> > #!/bin/sh
> > #
> > -# $Id: Filesystem.in,v 1.15 2006/02/11 13:40:42 xunsun Exp $
> > +# $Id: Filesystem.in,v 1.16 2006/03/09 12:55:10 xunsun Exp $
> > #
> > # Support: [EMAIL PROTECTED]
> > # License: GNU General Public License (GPL)
> > @@ -95,7 +95,7 @@
> > usage() {
> > cat <<-EOT
> > usage: $0 {start|stop|status|monitor|validate-all|meta-data}
> > - $Id: Filesystem.in,v 1.15 2006/02/11 13:40:42 xunsun Exp $
> > + $Id: Filesystem.in,v 1.16 2006/03/09 12:55:10 xunsun Exp $
> > EOT
> > }
> >
> > @@ -348,7 +348,7 @@
> > case "$OP" in
> > status) ocf_log info "$msg";;
> > esac
> > - return $rc
> > + return_master $rc
>
> Not a multi-state resource. (I get tired of writing this so this will be
> the last agent I'm flagging. But it's amazing you manage to break our
> second most important RA w/o testing it.)
>
> > --- IPv6addr.c 20 Dec 2005 08:34:47 -0000 1.9
> > +++ IPv6addr.c 9 Mar 2006 12:55:10 -0000 1.10
> > @@ -81,10 +81,15 @@
> > * return 0(OCF_SUCCESS) for response correctly.
> > * return 1(OCF_NOT_RUNNING) for no response.
> > * return 2(OCF_ERR_ARGS) for invalid or excess argument(s)
> > - *
> > + * return 8(OCF_RUNNING_MASTER) for running in "master" mode
> > */
> >
> > +#include <portability.h>
> > +#ifdef HAVE_CONFIG_H
> > #include <config.h>
> > +#endif
> > +
> > +#include <stdlib.h>
> > #include <sys/types.h>
> > #include <netinet/icmp6.h>
> > #include <libgen.h>
> > @@ -109,6 +114,8 @@
> > 5 program is not installed
> > 6 program is not configured
> > 7 program is not running
> > +8 resource is running in "master" mode and fully operational
> > +9 resource is in "master" mode but in a failed state
> > */
> > #define OCF_SUCCESS 0
> > #define OCF_ERR_GENERIC 1
> > @@ -118,6 +125,8 @@
> > #define OCF_ERR_INSTALLED 5
> > #define OCF_ERR_CONFIGURED 6
> > #define OCF_NOT_RUNNING 7
> > +#define OCF_RUNNING_MASTER 8
> > +#define OCF_FAILED_MASTER 9
>
> Actually it seems advisable to have an ocf_ra.h headerfile somewhere and
> define these there instead of duplicating LRM/CRM code within the RAs.
>
>
> > @@ -363,11 +372,14 @@
> > int
> > monitor_addr6(struct in6_addr* addr6, int prefix_len)
> > {
> > + char* interval;
> > if(0 == is_addr6_available(addr6)) {
> > + interval = getenv("OCF_RESKEY_interval");
> > + if (interval && atoi(interval) == 0)
> > + return OCF_RUNNING_MASTER;
>
> I don't even begin to understand the relevance of the interval here?
> Probing isn't anything special; this error code is relevant for monitor
> too. You really shouldn't care about this.
Isn't interval supposed the key to distinguish probe and monitor actions?
>
> (See below.)
>
> > --- ServeRAID.in 26 Jan 2006 18:00:05 -0000 1.8
> > +++ ServeRAID.in 9 Mar 2006 12:55:10 -0000 1.9
> > @@ -1,6 +1,6 @@
> > #!/bin/sh
> > #
> > -# $Id: ServeRAID.in,v 1.8 2006/01/26 18:00:05 lars Exp $
> > +# $Id: ServeRAID.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> > #
> > # ServeRAID
> > #
> > @@ -104,7 +104,7 @@
> > You will need at least version 6.10 (~July 2003 release) of the
> > ipssend
> > command for this script to work.
> >
> > - $Id: ServeRAID.in,v 1.8 2006/01/26 18:00:05 lars Exp $
> > + $Id: ServeRAID.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> > !
> > }
> >
> > @@ -210,30 +210,30 @@
> > # codes reversed - it returns 1 for success, and 0 for failure...
> > # We account for that too...
> > #
> > -run() {
> > - output=`"$@" 2>&1`
> > - rc=$?
> > - output=`echo $output`
> > - if
> > - [ $rc -eq $srsuccess ]
> > - then
> > - if
> > - [ ! -z "$output" ]
> > - then
> > - ocf_log "info" "$output"
> > - fi
> > - return 0
> > - else
> > - if
> > - [ ! -z "$output" ]
> > - then
> > - ocf_log "err" "$output"
> > - else
> > - ocf_log "err" "command failed: $*"
> > - fi
> > - return 1
> > - fi
> > -}
> > +#run() {
> > +# output=`"$@" 2>&1`
> > +# rc=$?
> > +# output=`echo $output`
> > +# if
> > +# [ $rc -eq $srsuccess ]
> > +# then
> > +# if
> > +# [ ! -z "$output" ]
> > +# then
> > +# ocf_log "info" "$output"
> > +# fi
> > +# return 0
> > +# else
> > +# if
> > +# [ ! -z "$output" ]
> > +# then
> > +# ocf_log "err" "$output"
> > +# else
> > +# ocf_log "err" "command failed: $*"
> > +# fi
> > +# return 1
> > +# fi
> > +#}
>
> Could you please decide whether you want to remove it like in the others
> or not? Why are you commenting this one out?
I did this because of the piece of comment on top of this function.
This run() function is slightly different because it has its success
status parameterized. I thought run() function would have to change if
ipssend reversed its return status again (stupid thought?)
>
> > SRLogicalDriveConfig() {
> > $IPS getconfig $sr_adapter ld
> > @@ -435,7 +435,7 @@
> > ServeRAID_status $serveraid $mergegroup
> > then
> > ocf_log debug "ServeRAID merge group $serveraid $mergegroup is
> > running."
> > - exit $OCF_SUCCESS
> > + return_master $OCF_SUCCESS
>
> This works (because of the later "exit $?") but is non-obvious code.
> You're merely setting the exit code, but not actually returning from the
> script, as this might be read as.
OK, I will keep in mind to write obvious command, thanks, and sorry for trouble.
>
> There's other instances of this fragment in this patch. I'm not flagging
> all of them.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/apache.in,v
> > retrieving revision 1.14
> > retrieving revision 1.15
> > diff -u -3 -r1.14 -r1.15
> > --- apache.in 11 Feb 2006 14:34:11 -0000 1.14
> > +++ apache.in 9 Mar 2006 12:55:10 -0000 1.15
> > @@ -403,9 +376,10 @@
> > silent_status
> > then
> > run sh -c "$WGET $WGETOPTS $STATUSURL | grep -i '</ *body *></ *html
> > *>' >/dev/null"
> > + return_master $?
> > else
> > - ocf_log err "$CMD not running"
> > - return $OCF_ERR_GENERIC;
> > + ocf_log info "$CMD not running"
> > + return $OCF_NOT_RUNNING
> > fi
>
> The return_master is obviously wrong,
Because apache is not a master/slave resource? I will undo this
(actually almost anything here).
> the second part of the chunk is
> actually correct.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/ocf-shellfuncs.in,v
> > retrieving revision 1.24
> > retrieving revision 1.25
> > diff -u -3 -r1.24 -r1.25
> > --- ocf-shellfuncs.in 18 Nov 2005 05:48:40 -0000 1.24
> > +++ ocf-shellfuncs.in 9 Mar 2006 12:55:10 -0000 1.25
> > @@ -1,5 +1,5 @@
> > #
> > -# $Id: ocf-shellfuncs.in,v 1.24 2005/11/18 05:48:40 alan Exp $
> > +# $Id: ocf-shellfuncs.in,v 1.25 2006/03/09 12:55:10 xunsun Exp $
> > #
> > # Common helper functions for the OCF Resource Agents supplied by
> > # heartbeat.
> > @@ -208,4 +208,46 @@
> > ha_log "${__OCF_PRIO}: $__OCF_MSG"
> > }
> >
> > +return_master() {
> > + if [ $# -ne 1 ]; then
> > + return $OCF_ERR_GENERIC
> > + fi
> > +
> > + rc=$1
> > + if [ "${__OCF_ACTION}" = "monitor" -a "${OCF_RESKEY_interval}" = "0"
> > ]; then
> > + if [ "$rc" = "$OCF_SUCCESS" ]; then
> > + return $OCF_RUNNING_MASTER
> > + fi
> > +
> > + if [ "$rc" = "$OCF_ERR_GENERIC" ]; then
> > + return $OCF_FAILED_MASTER
> > + fi
> > + fi
> > +
> > + return $rc
> > +}
>
> How does the RESKEY_interval figure in here? It really shouldn't matter.
>
> If it actually _does_ matter (ie, monitor supposed to return different
> things at probe time than at runtime), I'll whack Andrew for you.
>
> But this is entirely broken, called in all the wrong places, and not
> just that, but a wrong idiom too.
>
> A RA which _does_ support multi-state will know when to return
> RUNNING_MASTER or FAILED_MASTER in its monitor function, and it'll be
> much better off doing so directly than jumping through this ugly
> indirection.
>
> This entire patch (except for the very few legit chunks of bugfixes and
> the run() consolidation maybe, but if you move it to ocf-shellfuncs, it
> should be called ocf_run() for consistency) should be backed out with
> prejudice.
Sorry for the trouble, I will back the patch out.
>
>
>
> Sincerely,
> Lars Marowsky-Br�e
>
> --
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"
>
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
--
Thanks & regards
Xun Sun
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/