On Wed, Dec 29, 2010 at 11:01:19AM +0100, [email protected] wrote:
> > > # HG changeset patch
> > > # User Alexander Krauth <[email protected]>
> > > # Date 1293543578 -3600
> > > # Node ID 73e079bc06373c5dc032ac294a344a848f304ccf
> > > # Parent  af62007952a5b32281622e88f3e3faf039aec187
> > > High: SAPInstance: Make more use of ocf-shellfuncs where possible
> > > 
> > > diff -r af62007952a5 -r 73e079bc0637 heartbeat/SAPInstance
> > > --- a/heartbeat/SAPInstance   Tue Dec 28 14:36:11 2010 +0100
> > > +++ b/heartbeat/SAPInstance   Tue Dec 28 14:39:38 2010 +0100
> > > @@ -208,6 +208,27 @@
> > > 
> > > 
> > >  #
> > > +# abnormal_end : essential things are missing, but in the natur of a 
> SAP installation - which can be very different
> > > +#                from customer to customer - we cannot handle this 
> always as an error
> > > +#                This would be the case, if the software is installed 
> on shared disks and not visible
> > > +#                to all cluster nodes at all times.
> > > +#
> > > +abnormal_end() {
> > > +  err_msg=$1
> > > +
> > > +  ocf_is_probe && exit sapinstance_status
> > 
> > This won't work. Perhaps you wanted to do sth like this:
> > 
> >   ocf_is_probe && {
> >     sapinstance_status
> >     exit
> >   }
> 
> Hm, don't know how I came to this !?
> I'll change it into:
> 
>    ocf_is_probe && {
>      sapinstance_status
>      exit $?
>    }
> 
> > > +
> > > +  if [ "$ACTION" = "stop" ]
> > > +  then
> > > +    cleanup_instance
> > > +    exit $OCF_SUCCESS
> > > +  fi
> > > +
> > > +  ocf_log err $err_msg
> > > +  exit $OCF_ERR_ARGS
> > 
> > This should be:
> > 
> >   exit $OCF_ERR_CONFIGURED
> 
> OK.
> 
> > > +}
> > > +
> > > +#
> > >  # sapinstance_init : Define global variables with default values, if 
> optional parameters are not set
> > >  #
> > >  #
> > > @@ -233,16 +254,18 @@
> > >        DIR_EXECUTABLE="/usr/sap/$SID/SYS/exe/run"
> > >        SAPSTARTSRV="/usr/sap/$SID/SYS/exe/run/sapstartsrv"
> > >        SAPCONTROL="/usr/sap/$SID/SYS/exe/run/sapcontrol"
> > > -    else
> > > -      ocf_log warn "Cannot find sapstartsrv and sapcontrol 
> executable, please set DIR_EXECUTABLE parameter!"
> > > -      exit $OCF_NOT_RUNNING
> > >      fi
> > >    else
> > > -    DIR_EXECUTABLE="$OCF_RESKEY_DIR_EXECUTABLE"
> > > -    SAPSTARTSRV="$OCF_RESKEY_DIR_EXECUTABLE/sapstartsrv"
> > > -    SAPCONTROL="$OCF_RESKEY_DIR_EXECUTABLE/sapcontrol"
> > > +    if have_binary "$OCF_RESKEY_DIR_EXECUTABLE/sapstartsrv" && 
> have_binary "$OCF_RESKEY_DIR_EXECUTABLE/sapcontrol"
> > > +    then
> > > +      DIR_EXECUTABLE="$OCF_RESKEY_DIR_EXECUTABLE"
> > > +      SAPSTARTSRV="$OCF_RESKEY_DIR_EXECUTABLE/sapstartsrv"
> > > +      SAPCONTROL="$OCF_RESKEY_DIR_EXECUTABLE/sapcontrol"
> > > +    fi
> > >    fi
> > > 
> > > +  [ -z "$DIR_EXECUTABLE" ] && abnormal_end "Cannot find sapstartsrv 
> and sapcontrol executable, please set DIR_EXECUTABLE parameter!"
> > > +
> > >    if [ -z "$OCF_RESKEY_DIR_PROFILE" ]
> > >    then
> > >      DIR_PROFILE="/usr/sap/$SID/SYS/profile"
> > > @@ -329,15 +352,10 @@
> > >      then
> > >        DIR_PROFILE="/usr/sap/$SID/SYS/profile"
> > >      else
> > > -      ocf_log warn "Expected /usr/sap/$SID/SYS/profile/ to be a 
> directory, please set DIR_PROFILE parameter!"
> > > -      exit $OCF_NOT_RUNNING
> > > +      abnormal_end "Expected /usr/sap/$SID/SYS/profile/ to be a 
> directory, please set DIR_PROFILE parameter!"
> > >      fi
> > > 
> > > -    if [ ! -r $SAPSTARTPROFILE ]
> > > -    then
> > > -      ocf_log warn "Expected $SAPSTARTPROFILE to be the instance 
> START profile, please set START_PROFILE parameter!"
> > > -      exit $OCF_NOT_RUNNING
> > > -    fi
> > > +    [ ! -r $SAPSTARTPROFILE ] && abnormal_end "Expected 
> $SAPSTARTPROFILE to be the instance START profile, please set 
> START_PROFILE parameter!"
> > 
> > Though this one should exit with code $OCF_ERR_INSTALLED.
> > Perhaps add the code parameter to abnormal_end?
> 
> Hm, it is about to test the SAP start profile, but the error is, that the 
> OCF_RESKEY_START_PROFILE is set to the wrong value (because the file 
> exists quite sure, we just don't know the right name of it).
> 
> Isn't that a OCF_ERR_CONFIGURED ?

It could be either, i.e. the file could be missing on a node.
Well, you can apply your SAP expertise here :) For instance, if
the profile is on the shared storage then it's safe to say
OCF_ERR_CONFIGURED.

> > >      pkill -9 -f "sapstartsrv.*$runninginst"
> > >      $SAPSTARTSRV pf=$SAPSTARTPROFILE -D -u $sidadm
> > > @@ -357,7 +375,8 @@
> > >        chkrc=$OCF_SUCCESS
> > >      else
> > >        ocf_log error "sapstartsrv for instance $SID-$InstanceName 
> could not be started!"
> > > -      chkrc=$OCF_NOT_RUNNING
> > > +      chkrc=$OCF_ERR_GENERIC
> > > +      [ "$__OCF_ACTION" = "monitor" ] && chkrc=$OCF_NOT_RUNNING
> > 
> > Better:
> > 
> >       ocf_is_probe && chkrc=$OCF_NOT_RUNNING
> 
> I'd like to change it into:
> 
>       [ "$ACTION" = "monitor" ] && chkrc=$OCF_NOT_RUNNING
> 
> Because probe AND monitor was meant here.

Note that monitor runs only in case the resource has been
previously successfully started. So, this is an unexpected
condition. We can't say if the resource is completely down,
right? I mean, we don't know how it was stopped.

BTW, if sapstartsrv isn't running, the SAP instance could still
be up, right?

> > >      fi
> > >    fi
> > > 
> > > @@ -415,9 +434,12 @@
> > >      loopcount=$(($loopcount + 1))
> > > 
> > >      check_sapstartsrv
> > > -    output=`$SAPCONTROL -nr $InstanceNr -function Start`
> > >      rc=$?
> > > -    ocf_log info "Starting SAP Instance $SID-$InstanceName: $output"
> > > +    if [ $rc -eq $OCF_SUCCESS ]; then
> > > +      output=`$SAPCONTROL -nr $InstanceNr -function Start`
> > > +      rc=$?
> > > +      ocf_log info "Starting SAP Instance $SID-$InstanceName: 
> $output"
> > > +    fi
> > > 
> > >      if [ $rc -ne 0 ]
> > >      then
> > > @@ -491,8 +513,13 @@
> > >    sapuserexit PRE_STOP_USEREXIT "$OCF_RESKEY_PRE_STOP_USEREXIT"
> > > 
> > >    check_sapstartsrv
> > > +  rc=$?
> > > +  if [ $rc -eq $OCF_SUCCESS ]; then
> > > +    output=`$SAPCONTROL -nr $InstanceNr -function Stop`
> > > +    rc=$?
> > > +    ocf_log info "Stopping SAP Instance $SID-$InstanceName: $output"
> > > +  fi
> > > 
> > > -  output=`$SAPCONTROL -nr $InstanceNr -function Stop`
> > >    if [ $? -eq 0 ]
> > >    then
> > >      output=`$SAPCONTROL -nr $InstanceNr -function WaitforStopped 3600 
> 1`
> > > @@ -558,17 +585,34 @@
> > > 
> > >      if [ $count -eq 0 -a $rc -eq $OCF_SUCCESS ]
> > >      then
> > > -      if [ "$MONLOG" != "NOLOG" ]
> > > +      if ocf_is_probe
> > >        then
> > > -        ocf_log err "The SAP instance does not run any services which 
> this RA could monitor!"
> > > +        rc=$OCF_NOT_RUNNING
> > > +      else
> > > +        [ "$MONLOG" != "NOLOG" ] && ocf_log err "The SAP instance 
> does not run any services which this RA could monitor!"
> > > +        rc=$OCF_ERR_ARGS
> > 
> > I think that this should be $OCF_ERR_GENERIC. It is expected at
> > this point to have the SAP running, but it isn't and we don't
> > know why. BTW, $OCF_ERR_ARGS should be used only on bad usage
> > (i.e. wrong number of args on the command line).
> 
> Ok.
> 
> > >        fi
> > > -      rc=$OCF_ERR_ARGS
> > >      fi
> > >    fi
> > > 
> > >    return $rc
> > >  }
> > > 
> > > +
> > > +#
> > > +# sapinstance_status: Lightweight check of SAP instance only with OS 
> tools
> > > +#
> > > +sapinstance_status() {
> > > +  [ ! -f "/usr/sap/$SID/$InstanceName/work/kill.sap" ] && return 
> $OCF_NOT_RUNNING
> > > +  pids=`grep '^kill -[0-9]' /usr/sap/$SID/$InstanceName/work/kill.sap 
> | awk '{print $3}'`
> > > +  for pid in $pids
> > 
> > Expect multiple pids? Then the awk part is wrong. Better do:
> > 
> >   pids=`grep '^kill -[0-9]' /usr/sap/$SID/$InstanceName/work/kill.sap | 
> cut -f3- -d' '`
> 
> Why should the awk don't work ?
> I had the 'cut' first, but I'm not absolutely sure, if they have always 
> only one blank between the args.
> 
> a...@ncc1701d:~> echo "kill -2 12345" > kill.sap
> a...@ncc1701d:~> echo "kill -2   345" >> kill.sap
> a...@ncc1701d:~> grep '^kill -[0-9]' kill.sap | awk '{print $3}'
> 12345
> 345

Somehow I imagined that all pids would be on one line, i.e. sth
like "kill -2 12345 567 ..."  So, awk is fine here.

> > > +  do
> > > +    [ `pgrep -f -U $sidadm $InstanceName | grep -c $pid` -gt 0 ] && 
> return $OCF_SUCCESS
> > > +  done
> > > +  return $OCF_NOT_RUNNING
> > > +}
> > > +
> > > +
> > >  #
> > >  # sapinstance_validate: Check the symantic of the input parameters 
> > >  #
> > 
> > Thanks,
> > 
> > Dejan
> 
> Thanks for your help,
> Alex

Welcome.

Dejan

> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to