On 3/10/06, Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote:
> On 2006-03-09T05:55:11, [email protected] wrote:
>
> > linux-ha CVS committal
> >
> > Author  : xunsun
> > Host    :
> > Project : linux-ha
> > Module  : resources
> >
> > Dir     : linux-ha/resources/OCF
> >
> >
> > Modified Files:
> >       AudibleAlarm.in ClusterMon.in Delay.in Dummy.in Filesystem.in
> >       ICP.in IPaddr.in IPaddr2.in IPsrcaddr.in IPv6addr.c LVM.in
> >       LinuxSCSI.in MailTo.in Raid1.in ServeRAID.in WAS.in
> >       WinPopup.in Xinetd.in apache.in db2.in drbd.in
> >       ocf-shellfuncs.in portblock.in
> >
> >
> > Log Message:
> > *moved run() function to ocf-shellfuncs
> > *added logic to support RA states OCF_RUNNING_MASTER and OCF_FAILED_MASTER
>
> run() maybe, but "return_master" looks extremly broken and the useage of
> it extremly weird.

I admit 'return_master' is broken after reading your mail :( And it
should not exist.

>
> How have these changes been tested?
>
> Some highlights, just flagged once for each issue, they tend to occur
> repeatedly:
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/ClusterMon.in,v
> > retrieving revision 1.8
> > retrieving revision 1.9
> > diff -u -3 -r1.8 -r1.9
> > --- ClusterMon.in     2 Feb 2006 14:21:08 -0000       1.8
> > +++ ClusterMon.in     9 Mar 2006 12:55:10 -0000       1.9
> > @@ -1,6 +1,6 @@
> >  #!/bin/sh
> >  #
> > -#    $Id: ClusterMon.in,v 1.8 2006/02/02 14:21:08 andrew Exp $
> > +#    $Id: ClusterMon.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> >  #
> >  #    ClusterMon OCF RA. Does nothing but wait a few seconds, can be
> >  #    configured to fail occassionally.
> > @@ -159,27 +159,12 @@
> >      if [ ! -z $pid ]; then
> >       kill -0 $pid
> >       if [ $? = 0 ]; then
> > -         exit $OCF_SUCCESS
> > +         return `return_master $OCF_SUCCESS`
>
> uh? return_master doesn't output anything. Why the ``?
>
> Besides,
> ClusterMon IS NOT A MULTISTATE CAPABLE AGENT! Why it should EVER return
> a master state is beyond me.

If it's not a multi-state resource, and crm doesn't try to probe it,
it will return what it is expected to do. I admit here that I don't
completely know the requirement of a RA to be multi-state capable (is
it documented somewhere?).

I did this because I wanted these scripts to behave similarly. I will
undo this change, sorry for the trouble.
>
> >  # Check the update interval
> > -    if CheckInterval "$OCF_RESKEY_update"; then
> > +    if ocf_is_decimal "$OCF_RESKEY_update" && [ $OCF_RESKEY_update -gt 0 
> > ]; then
> >       :
> >      else
> >       ocf_log err "Invalid update interval $OCF_RESKEY_update. It should be 
> > positive integer!"
> > @@ -278,4 +263,4 @@
> >               ;;
> >  esac
> >
> > -exit $OCF_SUCCESS
> > +exit $?
>
> Pointless change; this is actually never reached, and if it is, it
> certainly is an error, if you want to change it to something else ;-)

This is reachable as a result of the (bad) change in ClusterMon_monitor.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Delay.in,v
> > retrieving revision 1.9
> > retrieving revision 1.10
> > diff -u -3 -r1.9 -r1.10
> > --- Delay.in  11 Feb 2006 14:21:20 -0000      1.9
> > +++ Delay.in  9 Mar 2006 12:55:10 -0000       1.10
> > @@ -102,7 +102,7 @@
> >      Delay_stat
> >    then
> >      ocf_log info "Delay is running OK"
> > -    return $OCF_SUCCESS
> > +    return_master $OCF_SUCCESS
> >    else
> >      ocf_log info "Delay is stopped"
> >      return $OCF_NOT_RUNNING
>
> Not multi-state capable.

OK, I will undo this.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Dummy.in,v
> > retrieving revision 1.2
> > retrieving revision 1.3
> > diff -u -3 -r1.2 -r1.3
> > --- Dummy.in  3 Jun 2005 03:27:31 -0000       1.2
> > +++ Dummy.in  9 Mar 2006 12:55:10 -0000       1.3
> > @@ -1,6 +1,6 @@
> >  #!/bin/sh
> >  #
> > -#    $Id: Dummy.in,v 1.2 2005/06/03 03:27:31 sunjd Exp $
> > +#    $Id: Dummy.in,v 1.3 2006/03/09 12:55:10 xunsun Exp $
> >  #
> >  #    Dummy OCF RA. Does nothing but wait a few seconds, can be
> >  #    configured to fail occassionally.
> > @@ -114,6 +114,7 @@
> >
> >  dummy_monitor() {
> >       sleep $OCF_RESKEY_monitor_delay
> > +     return_monitor $?
> >  }
>
> This very clearly has never been tested. I leave it to the reader to
> figure out why this is so clearly obvious.

Yes, sorry. Will undo it.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Filesystem.in,v
> > retrieving revision 1.15
> > retrieving revision 1.16
> > diff -u -3 -r1.15 -r1.16
> > --- Filesystem.in     11 Feb 2006 13:40:42 -0000      1.15
> > +++ Filesystem.in     9 Mar 2006 12:55:10 -0000       1.16
> > @@ -1,6 +1,6 @@
> >  #!/bin/sh
> >  #
> > -# $Id: Filesystem.in,v 1.15 2006/02/11 13:40:42 xunsun Exp $
> > +# $Id: Filesystem.in,v 1.16 2006/03/09 12:55:10 xunsun Exp $
> >  #
> >  # Support:      [EMAIL PROTECTED]
> >  # License:      GNU General Public License (GPL)
> > @@ -95,7 +95,7 @@
> >  usage() {
> >       cat <<-EOT
> >       usage: $0 {start|stop|status|monitor|validate-all|meta-data}
> > -     $Id: Filesystem.in,v 1.15 2006/02/11 13:40:42 xunsun Exp $
> > +     $Id: Filesystem.in,v 1.16 2006/03/09 12:55:10 xunsun Exp $
> >       EOT
> >  }
> >
> > @@ -348,7 +348,7 @@
> >          case "$OP" in
> >         status)       ocf_log info "$msg";;
> >       esac
> > -        return $rc
> > +        return_master $rc
>
> Not a multi-state resource. (I get tired of writing this so this will be
> the last agent I'm flagging. But it's amazing you manage to break our
> second most important RA w/o testing it.)
>
> > --- IPv6addr.c        20 Dec 2005 08:34:47 -0000      1.9
> > +++ IPv6addr.c        9 Mar 2006 12:55:10 -0000       1.10
> > @@ -81,10 +81,15 @@
> >   *   return 0(OCF_SUCCESS) for response correctly.
> >   *   return 1(OCF_NOT_RUNNING) for no response.
> >   *   return 2(OCF_ERR_ARGS) for invalid or excess argument(s)
> > - *
> > + *   return 8(OCF_RUNNING_MASTER) for running in "master" mode
> >   */
> >
> > +#include <portability.h>
> > +#ifdef HAVE_CONFIG_H
> >  #include <config.h>
> > +#endif
> > +
> > +#include <stdlib.h>
> >  #include <sys/types.h>
> >  #include <netinet/icmp6.h>
> >  #include <libgen.h>
> > @@ -109,6 +114,8 @@
> >  5    program is not installed
> >  6    program is not configured
> >  7    program is not running
> > +8    resource is running in "master" mode and fully operational
> > +9    resource is in "master" mode but in a failed state
> >  */
> >  #define      OCF_SUCCESS             0
> >  #define      OCF_ERR_GENERIC         1
> > @@ -118,6 +125,8 @@
> >  #define      OCF_ERR_INSTALLED       5
> >  #define      OCF_ERR_CONFIGURED      6
> >  #define      OCF_NOT_RUNNING         7
> > +#define OCF_RUNNING_MASTER   8
> > +#define OCF_FAILED_MASTER    9
>
> Actually it seems advisable to have an ocf_ra.h headerfile somewhere and
> define these there instead of duplicating LRM/CRM code within the RAs.
>
>
> > @@ -363,11 +372,14 @@
> >  int
> >  monitor_addr6(struct in6_addr* addr6, int prefix_len)
> >  {
> > +     char*   interval;
> >       if(0 == is_addr6_available(addr6)) {
> > +             interval = getenv("OCF_RESKEY_interval");
> > +             if (interval && atoi(interval) == 0)
> > +                     return OCF_RUNNING_MASTER;
>
> I don't even begin to understand the relevance of the interval here?
> Probing isn't anything special; this error code is relevant for monitor
> too. You really shouldn't care about this.

Isn't interval supposed the key to distinguish probe and monitor actions?
>
> (See below.)
>
> > --- ServeRAID.in      26 Jan 2006 18:00:05 -0000      1.8
> > +++ ServeRAID.in      9 Mar 2006 12:55:10 -0000       1.9
> > @@ -1,6 +1,6 @@
> >  #!/bin/sh
> >  #
> > -# $Id: ServeRAID.in,v 1.8 2006/01/26 18:00:05 lars Exp $
> > +# $Id: ServeRAID.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> >  #
> >  # ServeRAID
> >  #
> > @@ -104,7 +104,7 @@
> >       You will need at least version 6.10 (~July 2003 release) of the 
> > ipssend
> >       command for this script to work.
> >
> > -     $Id: ServeRAID.in,v 1.8 2006/01/26 18:00:05 lars Exp $
> > +     $Id: ServeRAID.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> >       !
> >  }
> >
> > @@ -210,30 +210,30 @@
> >  #    codes reversed - it returns 1 for success, and 0 for failure...
> >  #    We account for that too...
> >  #
> > -run() {
> > -  output=`"$@" 2>&1`
> > -  rc=$?
> > -  output=`echo $output`
> > -  if
> > -    [ $rc -eq $srsuccess ]
> > -  then
> > -    if
> > -      [ ! -z "$output" ]
> > -    then
> > -      ocf_log "info" "$output"
> > -    fi
> > -    return 0
> > -  else
> > -    if
> > -      [ ! -z "$output" ]
> > -    then
> > -      ocf_log "err" "$output"
> > -    else
> > -      ocf_log "err"  "command failed: $*"
> > -    fi
> > -    return 1
> > -  fi
> > -}
> > +#run() {
> > +#  output=`"$@" 2>&1`
> > +#  rc=$?
> > +#  output=`echo $output`
> > +#  if
> > +#    [ $rc -eq $srsuccess ]
> > +#  then
> > +#    if
> > +#      [ ! -z "$output" ]
> > +#    then
> > +#      ocf_log "info" "$output"
> > +#    fi
> > +#    return 0
> > +#  else
> > +#    if
> > +#      [ ! -z "$output" ]
> > +#    then
> > +#      ocf_log "err" "$output"
> > +#    else
> > +#      ocf_log "err"  "command failed: $*"
> > +#    fi
> > +#    return 1
> > +#  fi
> > +#}
>
> Could you please decide whether you want to remove it like in the others
> or not? Why are you commenting this one out?

I did this because of the piece of comment on top of this function.
This run() function is slightly different because it has its success
status parameterized. I thought run() function would have to change if
ipssend reversed its return status again (stupid thought?)
>
> >  SRLogicalDriveConfig() {
> >    $IPS getconfig $sr_adapter ld
> > @@ -435,7 +435,7 @@
> >         ServeRAID_status $serveraid $mergegroup
> >       then
> >         ocf_log debug "ServeRAID merge group $serveraid $mergegroup is 
> > running."
> > -       exit $OCF_SUCCESS
> > +       return_master $OCF_SUCCESS
>
> This works (because of the later "exit $?") but is non-obvious code.
> You're merely setting the exit code, but not actually returning from the
> script, as this might be read as.

OK, I will keep in mind to write obvious command, thanks, and sorry for trouble.
>
> There's other instances of this fragment in this patch. I'm not flagging
> all of them.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/apache.in,v
> > retrieving revision 1.14
> > retrieving revision 1.15
> > diff -u -3 -r1.14 -r1.15
> > --- apache.in 11 Feb 2006 14:34:11 -0000      1.14
> > +++ apache.in 9 Mar 2006 12:55:10 -0000       1.15
> > @@ -403,9 +376,10 @@
> >      silent_status
> >    then
> >      run sh -c "$WGET $WGETOPTS $STATUSURL | grep -i '</ *body *></ *html 
> > *>' >/dev/null"
> > +    return_master $?
> >    else
> > -    ocf_log err "$CMD not running"
> > -    return $OCF_ERR_GENERIC;
> > +    ocf_log info "$CMD not running"
> > +    return $OCF_NOT_RUNNING
> >    fi
>
> The return_master is obviously wrong,

Because apache is not a master/slave resource? I will undo this
(actually almost anything here).

> the second part of the chunk is
> actually correct.
>
> > ===================================================================
> > RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/ocf-shellfuncs.in,v
> > retrieving revision 1.24
> > retrieving revision 1.25
> > diff -u -3 -r1.24 -r1.25
> > --- ocf-shellfuncs.in 18 Nov 2005 05:48:40 -0000      1.24
> > +++ ocf-shellfuncs.in 9 Mar 2006 12:55:10 -0000       1.25
> > @@ -1,5 +1,5 @@
> >  #
> > -#    $Id: ocf-shellfuncs.in,v 1.24 2005/11/18 05:48:40 alan Exp $
> > +#    $Id: ocf-shellfuncs.in,v 1.25 2006/03/09 12:55:10 xunsun Exp $
> >  #
> >  #    Common helper functions for the OCF Resource Agents supplied by
> >  #    heartbeat.
> > @@ -208,4 +208,46 @@
> >       ha_log "${__OCF_PRIO}: $__OCF_MSG"
> >  }
> >
> > +return_master() {
> > +     if [ $# -ne 1 ]; then
> > +         return $OCF_ERR_GENERIC
> > +     fi
> > +
> > +     rc=$1
> > +     if [ "${__OCF_ACTION}" = "monitor" -a "${OCF_RESKEY_interval}" = "0" 
> > ]; then
> > +         if [ "$rc" = "$OCF_SUCCESS" ]; then
> > +             return $OCF_RUNNING_MASTER
> > +         fi
> > +
> > +         if [ "$rc" = "$OCF_ERR_GENERIC" ]; then
> > +             return $OCF_FAILED_MASTER
> > +         fi
> > +     fi
> > +
> > +     return $rc
> > +}
>
> How does the RESKEY_interval figure in here? It really shouldn't matter.
>
> If it actually _does_ matter (ie, monitor supposed to return different
> things at probe time than at runtime), I'll whack Andrew for you.
>
> But this is entirely broken, called in all the wrong places, and not
> just that, but a wrong idiom too.
>
> A RA which _does_ support multi-state will know when to return
> RUNNING_MASTER or FAILED_MASTER in its monitor function, and it'll be
> much better off doing so directly than jumping through this ugly
> indirection.
>
> This entire patch (except for the very few legit chunks of bugfixes and
> the run() consolidation maybe, but if you move it to ocf-shellfuncs, it
> should be called ocf_run() for consistency) should be backed out with
> prejudice.

Sorry for the trouble, I will back the patch out.
>
>
>
> Sincerely,
>     Lars Marowsky-Br�e
>
> --
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business     -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"
>
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
--
Thanks & regards
Xun Sun
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to