On 2006-03-09T05:55:11, [email protected] wrote:
> linux-ha CVS committal
>
> Author : xunsun
> Host :
> Project : linux-ha
> Module : resources
>
> Dir : linux-ha/resources/OCF
>
>
> Modified Files:
> AudibleAlarm.in ClusterMon.in Delay.in Dummy.in Filesystem.in
> ICP.in IPaddr.in IPaddr2.in IPsrcaddr.in IPv6addr.c LVM.in
> LinuxSCSI.in MailTo.in Raid1.in ServeRAID.in WAS.in
> WinPopup.in Xinetd.in apache.in db2.in drbd.in
> ocf-shellfuncs.in portblock.in
>
>
> Log Message:
> *moved run() function to ocf-shellfuncs
> *added logic to support RA states OCF_RUNNING_MASTER and OCF_FAILED_MASTER
run() maybe, but "return_master" looks extremly broken and the useage of
it extremly weird.
How have these changes been tested?
Some highlights, just flagged once for each issue, they tend to occur
repeatedly:
> ===================================================================
> RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/ClusterMon.in,v
> retrieving revision 1.8
> retrieving revision 1.9
> diff -u -3 -r1.8 -r1.9
> --- ClusterMon.in 2 Feb 2006 14:21:08 -0000 1.8
> +++ ClusterMon.in 9 Mar 2006 12:55:10 -0000 1.9
> @@ -1,6 +1,6 @@
> #!/bin/sh
> #
> -# $Id: ClusterMon.in,v 1.8 2006/02/02 14:21:08 andrew Exp $
> +# $Id: ClusterMon.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> #
> # ClusterMon OCF RA. Does nothing but wait a few seconds, can be
> # configured to fail occassionally.
> @@ -159,27 +159,12 @@
> if [ ! -z $pid ]; then
> kill -0 $pid
> if [ $? = 0 ]; then
> - exit $OCF_SUCCESS
> + return `return_master $OCF_SUCCESS`
uh? return_master doesn't output anything. Why the ``?
Besides,
ClusterMon IS NOT A MULTISTATE CAPABLE AGENT! Why it should EVER return
a master state is beyond me.
> # Check the update interval
> - if CheckInterval "$OCF_RESKEY_update"; then
> + if ocf_is_decimal "$OCF_RESKEY_update" && [ $OCF_RESKEY_update -gt 0 ];
> then
> :
> else
> ocf_log err "Invalid update interval $OCF_RESKEY_update. It should be
> positive integer!"
> @@ -278,4 +263,4 @@
> ;;
> esac
>
> -exit $OCF_SUCCESS
> +exit $?
Pointless change; this is actually never reached, and if it is, it
certainly is an error, if you want to change it to something else ;-)
> ===================================================================
> RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Delay.in,v
> retrieving revision 1.9
> retrieving revision 1.10
> diff -u -3 -r1.9 -r1.10
> --- Delay.in 11 Feb 2006 14:21:20 -0000 1.9
> +++ Delay.in 9 Mar 2006 12:55:10 -0000 1.10
> @@ -102,7 +102,7 @@
> Delay_stat
> then
> ocf_log info "Delay is running OK"
> - return $OCF_SUCCESS
> + return_master $OCF_SUCCESS
> else
> ocf_log info "Delay is stopped"
> return $OCF_NOT_RUNNING
Not multi-state capable.
> ===================================================================
> RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Dummy.in,v
> retrieving revision 1.2
> retrieving revision 1.3
> diff -u -3 -r1.2 -r1.3
> --- Dummy.in 3 Jun 2005 03:27:31 -0000 1.2
> +++ Dummy.in 9 Mar 2006 12:55:10 -0000 1.3
> @@ -1,6 +1,6 @@
> #!/bin/sh
> #
> -# $Id: Dummy.in,v 1.2 2005/06/03 03:27:31 sunjd Exp $
> +# $Id: Dummy.in,v 1.3 2006/03/09 12:55:10 xunsun Exp $
> #
> # Dummy OCF RA. Does nothing but wait a few seconds, can be
> # configured to fail occassionally.
> @@ -114,6 +114,7 @@
>
> dummy_monitor() {
> sleep $OCF_RESKEY_monitor_delay
> + return_monitor $?
> }
This very clearly has never been tested. I leave it to the reader to
figure out why this is so clearly obvious.
> ===================================================================
> RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/Filesystem.in,v
> retrieving revision 1.15
> retrieving revision 1.16
> diff -u -3 -r1.15 -r1.16
> --- Filesystem.in 11 Feb 2006 13:40:42 -0000 1.15
> +++ Filesystem.in 9 Mar 2006 12:55:10 -0000 1.16
> @@ -1,6 +1,6 @@
> #!/bin/sh
> #
> -# $Id: Filesystem.in,v 1.15 2006/02/11 13:40:42 xunsun Exp $
> +# $Id: Filesystem.in,v 1.16 2006/03/09 12:55:10 xunsun Exp $
> #
> # Support: [EMAIL PROTECTED]
> # License: GNU General Public License (GPL)
> @@ -95,7 +95,7 @@
> usage() {
> cat <<-EOT
> usage: $0 {start|stop|status|monitor|validate-all|meta-data}
> - $Id: Filesystem.in,v 1.15 2006/02/11 13:40:42 xunsun Exp $
> + $Id: Filesystem.in,v 1.16 2006/03/09 12:55:10 xunsun Exp $
> EOT
> }
>
> @@ -348,7 +348,7 @@
> case "$OP" in
> status) ocf_log info "$msg";;
> esac
> - return $rc
> + return_master $rc
Not a multi-state resource. (I get tired of writing this so this will be
the last agent I'm flagging. But it's amazing you manage to break our
second most important RA w/o testing it.)
> --- IPv6addr.c 20 Dec 2005 08:34:47 -0000 1.9
> +++ IPv6addr.c 9 Mar 2006 12:55:10 -0000 1.10
> @@ -81,10 +81,15 @@
> * return 0(OCF_SUCCESS) for response correctly.
> * return 1(OCF_NOT_RUNNING) for no response.
> * return 2(OCF_ERR_ARGS) for invalid or excess argument(s)
> - *
> + * return 8(OCF_RUNNING_MASTER) for running in "master" mode
> */
>
> +#include <portability.h>
> +#ifdef HAVE_CONFIG_H
> #include <config.h>
> +#endif
> +
> +#include <stdlib.h>
> #include <sys/types.h>
> #include <netinet/icmp6.h>
> #include <libgen.h>
> @@ -109,6 +114,8 @@
> 5 program is not installed
> 6 program is not configured
> 7 program is not running
> +8 resource is running in "master" mode and fully operational
> +9 resource is in "master" mode but in a failed state
> */
> #define OCF_SUCCESS 0
> #define OCF_ERR_GENERIC 1
> @@ -118,6 +125,8 @@
> #define OCF_ERR_INSTALLED 5
> #define OCF_ERR_CONFIGURED 6
> #define OCF_NOT_RUNNING 7
> +#define OCF_RUNNING_MASTER 8
> +#define OCF_FAILED_MASTER 9
Actually it seems advisable to have an ocf_ra.h headerfile somewhere and
define these there instead of duplicating LRM/CRM code within the RAs.
> @@ -363,11 +372,14 @@
> int
> monitor_addr6(struct in6_addr* addr6, int prefix_len)
> {
> + char* interval;
> if(0 == is_addr6_available(addr6)) {
> + interval = getenv("OCF_RESKEY_interval");
> + if (interval && atoi(interval) == 0)
> + return OCF_RUNNING_MASTER;
I don't even begin to understand the relevance of the interval here?
Probing isn't anything special; this error code is relevant for monitor
too. You really shouldn't care about this.
(See below.)
> --- ServeRAID.in 26 Jan 2006 18:00:05 -0000 1.8
> +++ ServeRAID.in 9 Mar 2006 12:55:10 -0000 1.9
> @@ -1,6 +1,6 @@
> #!/bin/sh
> #
> -# $Id: ServeRAID.in,v 1.8 2006/01/26 18:00:05 lars Exp $
> +# $Id: ServeRAID.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> #
> # ServeRAID
> #
> @@ -104,7 +104,7 @@
> You will need at least version 6.10 (~July 2003 release) of the ipssend
> command for this script to work.
>
> - $Id: ServeRAID.in,v 1.8 2006/01/26 18:00:05 lars Exp $
> + $Id: ServeRAID.in,v 1.9 2006/03/09 12:55:10 xunsun Exp $
> !
> }
>
> @@ -210,30 +210,30 @@
> # codes reversed - it returns 1 for success, and 0 for failure...
> # We account for that too...
> #
> -run() {
> - output=`"$@" 2>&1`
> - rc=$?
> - output=`echo $output`
> - if
> - [ $rc -eq $srsuccess ]
> - then
> - if
> - [ ! -z "$output" ]
> - then
> - ocf_log "info" "$output"
> - fi
> - return 0
> - else
> - if
> - [ ! -z "$output" ]
> - then
> - ocf_log "err" "$output"
> - else
> - ocf_log "err" "command failed: $*"
> - fi
> - return 1
> - fi
> -}
> +#run() {
> +# output=`"$@" 2>&1`
> +# rc=$?
> +# output=`echo $output`
> +# if
> +# [ $rc -eq $srsuccess ]
> +# then
> +# if
> +# [ ! -z "$output" ]
> +# then
> +# ocf_log "info" "$output"
> +# fi
> +# return 0
> +# else
> +# if
> +# [ ! -z "$output" ]
> +# then
> +# ocf_log "err" "$output"
> +# else
> +# ocf_log "err" "command failed: $*"
> +# fi
> +# return 1
> +# fi
> +#}
Could you please decide whether you want to remove it like in the others
or not? Why are you commenting this one out?
> SRLogicalDriveConfig() {
> $IPS getconfig $sr_adapter ld
> @@ -435,7 +435,7 @@
> ServeRAID_status $serveraid $mergegroup
> then
> ocf_log debug "ServeRAID merge group $serveraid $mergegroup is
> running."
> - exit $OCF_SUCCESS
> + return_master $OCF_SUCCESS
This works (because of the later "exit $?") but is non-obvious code.
You're merely setting the exit code, but not actually returning from the
script, as this might be read as.
There's other instances of this fragment in this patch. I'm not flagging
all of them.
> ===================================================================
> RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/apache.in,v
> retrieving revision 1.14
> retrieving revision 1.15
> diff -u -3 -r1.14 -r1.15
> --- apache.in 11 Feb 2006 14:34:11 -0000 1.14
> +++ apache.in 9 Mar 2006 12:55:10 -0000 1.15
> @@ -403,9 +376,10 @@
> silent_status
> then
> run sh -c "$WGET $WGETOPTS $STATUSURL | grep -i '</ *body *></ *html *>'
> >/dev/null"
> + return_master $?
> else
> - ocf_log err "$CMD not running"
> - return $OCF_ERR_GENERIC;
> + ocf_log info "$CMD not running"
> + return $OCF_NOT_RUNNING
> fi
The return_master is obviously wrong, the second part of the chunk is
actually correct.
> ===================================================================
> RCS file: /home/cvs/linux-ha/linux-ha/resources/OCF/ocf-shellfuncs.in,v
> retrieving revision 1.24
> retrieving revision 1.25
> diff -u -3 -r1.24 -r1.25
> --- ocf-shellfuncs.in 18 Nov 2005 05:48:40 -0000 1.24
> +++ ocf-shellfuncs.in 9 Mar 2006 12:55:10 -0000 1.25
> @@ -1,5 +1,5 @@
> #
> -# $Id: ocf-shellfuncs.in,v 1.24 2005/11/18 05:48:40 alan Exp $
> +# $Id: ocf-shellfuncs.in,v 1.25 2006/03/09 12:55:10 xunsun Exp $
> #
> # Common helper functions for the OCF Resource Agents supplied by
> # heartbeat.
> @@ -208,4 +208,46 @@
> ha_log "${__OCF_PRIO}: $__OCF_MSG"
> }
>
> +return_master() {
> + if [ $# -ne 1 ]; then
> + return $OCF_ERR_GENERIC
> + fi
> +
> + rc=$1
> + if [ "${__OCF_ACTION}" = "monitor" -a "${OCF_RESKEY_interval}" = "0" ];
> then
> + if [ "$rc" = "$OCF_SUCCESS" ]; then
> + return $OCF_RUNNING_MASTER
> + fi
> +
> + if [ "$rc" = "$OCF_ERR_GENERIC" ]; then
> + return $OCF_FAILED_MASTER
> + fi
> + fi
> +
> + return $rc
> +}
How does the RESKEY_interval figure in here? It really shouldn't matter.
If it actually _does_ matter (ie, monitor supposed to return different
things at probe time than at runtime), I'll whack Andrew for you.
But this is entirely broken, called in all the wrong places, and not
just that, but a wrong idiom too.
A RA which _does_ support multi-state will know when to return
RUNNING_MASTER or FAILED_MASTER in its monitor function, and it'll be
much better off doing so directly than jumping through this ugly
indirection.
This entire patch (except for the very few legit chunks of bugfixes and
the run() consolidation maybe, but if you move it to ocf-shellfuncs, it
should be called ocf_run() for consistency) should be backed out with
prejudice.
Sincerely,
Lars Marowsky-Brée
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/