I'm not entirely new to Heartbeat2, but I've run in to something here that I 
have not been able to figure out. What I'm trying to do is create a JBoss 
resource, as part of a resource group (disk, ip, mysql, jboss), for an 
application. I have the disk, ip, and MySQL resources working, it's just the 
JBoss resource that's proving to be more difficult than expected. This 
particular JBoss application takes a while to get fully started, which is where 
I think I'm running in to trouble. More on that below.


The servers are SLES10, and I'm using Heartbeat2:

Linux sles10-3 2.6.16.60-0.34-default #1 Fri Jan 16 14:59:01 UTC 2009 i686 i686 
i386 GNU/Linux

heartbeat-2.1.4-0.11



I've defined my JBoss application resource:

<primitive class="ocf" type="jboss" provider="heartbeat" is_managed="true" 
id="JBoss_4">
 <instance_attributes id="JBoss_4_instance_attrs">
  <attributes>
   <nvpair name="resource_name" value="IDMProv" 
id="4609063b-c767-4956-a2f9-f44f46b634a9"/>
   <nvpair name="console" value="/shared/uadisk/rbpm37/jboss.log" 
id="a78e869b-2b00-474c-987e-5919c1ce80e7"/>
   <nvpair name="shutdown_timeout" value="60" 
id="16812ae8-4f3c-46df-bdbe-24f7e0fcd557"/>
   <nvpair name="user" value="rbpm" id="f8ccf1ed-c701-4dc0-b8e5-d66c779a8b9f"/>
   <nvpair name="statusurl" value="http://131.156.12.4:8080/IDMProv"; 
id="11d32111-4e2b-4224-99d2-20af4eb43eb8"/>
   <nvpair name="java_home" value="/usr/java/jre1.6.0_18/" 
id="2881e29c-4b87-4094-bffc-d0d9e9682e16"/>
   <nvpair name="jboss_home" value="/shared/uadisk/rbpm37/jboss" 
id="d21e26f2-3e2f-4a98-9ecd-8f934b844434"/>
   <nvpair name="run_opts" value="-c IDMProv -b 0.0.0.0" 
id="681efa45-d7f3-4ef0-b90d-5d652b6480d6"/>
   <nvpair name="shutdown_opts" value="-S" 
id="f10343d3-28cc-4497-b234-e4678dda5818"/>
  </attributes>
 </instance_attributes>
 <operations>
  <op name="monitor" interval="10s" timeout="600s" start_delay="600s" 
id="317075e9-dabb-4923-9129-be16882f94a4"/>
  <op name="start" interval="900s" timeout="600s" start_delay="10s" 
id="ab407ca5-78e4-48e9-bee0-f70f64d011e4"/>
  <op name="stop" interval="10s" timeout="600s" start_delay="10s" 
id="33d5a72d-105c-4da0-99cb-b25de520a5ae"/>
 </operations>
</primitive>

This has gone through many iterations over the last few days. This is what's 
currently in the CIB.


This particular Heartbeat2 version didn't include a JBoss OCF script, but I 
obtained this one from the list archives:

#!/bin/sh
#
# Description:  Manages a Jboss Server as an OCF High-Availability
#               resource under Heartbeat/LinuxHA control
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  
# 02110-1301, USA.
#
# Copyright (c) 2009 Bauer Systems KG / Stefan Schluppeck
#
#########################################################################################################################################
# OCF parameters:
#   OCF_RESKEY_resource_name - The name of the resource. Default is 
${OCF_RESOURCE_INSTANCE}
# why not let the RA log through lrmd?
# 2009/09/09 Nakahira:
# jboss_console is used to record output of the "run.sh".
# The log of "Run.sh" should not be output to ha-log because it is so annoying.
#   OCF_RESKEY_console - A destination of the log of jboss run and shutdown 
script. Default is /var/log/${OCF_RESKEY_resource_name}.log
#   OCF_RESKEY_shutdown_timeout - Time-out at the time of the stop. Default is 5
#   OCF_RESKEY_kill_timeout - The re-try number of times awaiting a stop. 
Default is 10
#   OCF_RESKEY_user - A user name to start a JBoss. Default is root
#   OCF_RESKEY_statusurl - URL for state confirmation. Default is 
http://127.0.0.1:8080
#   OCF_RESKEY_java_home - Home directory of the Java. Default is ${JAVA_HOME}
#   OCF_RESKEY_jboss_home - Home directory of Jboss. Default is None
# is it possible to devise this string from options? I'm afraid
# that allowing users to set this could be error prone.
# 2009/09/09 Nakahira:
# It is difficult to set it automatically because jboss_pstring
# greatly depends on the environment. At any rate, system architect
# should note that pstring doesn't influence other processes.
#   OCF_RESKEY_pstring - String Jboss will found in procceslist. Default is 
"java -Dprogram.name=run.sh"
#   OCF_RESKEY_run_opts - Options for jboss to run. Default is "-c default -l 
lpg4j"
#   OCF_RESKEY_shutdown_opts - Options for jboss to shutdown. Default is "-s 
127.0.0.1:1099"
#########################################################################################################################################

##################################################################################################################
# OCF_ROOT="/usr/lib/ocf"
#   OCF_RESKEY_resource_name
#   OCF_RESKEY_console
#   OCF_RESKEY_shutdown_timeout
#   OCF_RESKEY_kill_timeout
#   OCF_RESKEY_user
#   OCF_RESKEY_statusurl
#   OCF_RESKEY_java_home
#   OCF_RESKEY_jboss_home
#   OCF_RESKEY_pstring
#   OCF_RESKEY_run_opts
#   OCF_RESKEY_shutdown_opts

# OCF_ROOT="/usr/lib/ocf"
# OCF_RESKEY_resource_name="IDMProv"
# OCF_RESOURCE_INSTANCE="1"
# OCF_RESKEY_console="/shared/uadisk/rbpm37/jboss.log"
# OCF_RESKEY_user="rbpm"
# OCF_RESKEY_statusurl="http://131.156.12.4:8080/IDMProv";
# OCF_RESKEY_java_home="/usr/java/jre1.6.0_18/"
# OCF_RESKEY_jboss_home="/shared/uadisk/rbpm37/jboss"
# OCF_RESKEY_run_opts="-c IDMProv -b 0.0.0.0"
# OCF_RESKEY_shutdown_opts="-S"

echo 
..................................................................................................................
 >> /shared/uadisk/rbpm37/ocf2.out
echo JBoss " " `date` >> /shared/uadisk/rbpm37/ocf2.out
echo OCF_ROOT is ${OCF_ROOT} >> /shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_resource_name is ${OCF_RESKEY_resource_name} >> 
/shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_console is ${OCF_RESKEY_console} >> 
/shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_kill_timeout is ${OCF_RESKEY_kill_timeout} >> 
/shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_user is ${OCF_RESKEY_user} >> /shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_statusurl is ${OCF_RESKEY_statusurl} >> 
/shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_java_home is ${OCF_RESKEY_java_home} >> 
/shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_jboss_home is ${OCF_RESKEY_jboss_home} >> 
/shared/uadisk/rbpm37/ocf2.out
echo   "OCF_RESKEY_pstring is ${OCF_RESKEY_pstring}" >> 
/shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_run_opts is ${OCF_RESKEY_run_opts} >> 
/shared/uadisk/rbpm37/ocf2.out
echo   OCF_RESKEY_shutdown_opts is ${OCF_RESKEY_shutdown_opts} >> 
/shared/uadisk/rbpm37/ocf2.out
echo   pwd is `pwd` >> /shared/uadisk/rbpm37/ocf2.out
echo /JBoss >> /shared/uadisk/rbpm37/ocf2.out
##################################################################################################################


. ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs

usage() 
{
        cat <<-!
usage: $0 action

action:
        start   start jboss

        stop    stop the jboss

        status  return the status of jboss, run or down

        monitor  return TRUE if the jboss appears to be working.
                 You have to have installed $WGETNAME for this to work.

        meta-data       show meta data message

        validate-all    validate the instance parameters
!
        return $OCF_ERR_ARGS
}

isrunning_jboss()
{
echo "(Is Running `date`)" >> /shared/uadisk/rbpm37/ocf2.out
        if wget -O /dev/null $STATUSURL 2>/dev/null; then
                return $OCF_SUCCESS
        fi
        # JBoss service error 
        return $OCF_ERR_GENERIC
}

monitor_jboss()
{
echo "(Monitor `date`)" >> /shared/uadisk/rbpm37/ocf2.out
        if ! pgrep -f "$PSTRING" > /dev/null; then
                return $OCF_NOT_RUNNING
        fi
        isrunning_jboss
}

start_jboss()
{
echo "(Start `date`)" >> /shared/uadisk/rbpm37/ocf2.out
        monitor_jboss
        if [ $? = $OCF_SUCCESS ]; then
                return $OCF_SUCCESS
        fi

        ocf_log info "Starting JBoss[$RESOURCE_NAME]"
        if [ "$JBOSS_USER" = root ]; then
                "$JBOSS_HOME/bin/run.sh" $RUN_OPTS \
                        >> "$CONSOLE" 2>&1 &
        else
echo "su - -s /bin/bash $JBOSS_USER -c export JAVA_HOME=$JAVA_HOME; export 
JBOSS_HOME=$JBOSS_HOME; $JBOSS_HOME/bin/run.sh $RUN_OPTS" >> 
/shared/uadisk/rbpm37/ocf2.out
echo "  JBOSS_USER is $JBOSS_USER" >> /shared/uadisk/rbpm37/ocf2.out
echo "  JAVA_HOME is $JAVA_HOME" >> /shared/uadisk/rbpm37/ocf2.out
echo "  JBOSS_HOME is $JBOSS_HOME" >> /shared/uadisk/rbpm37/ocf2.out
echo "  RUN_OPTS is $RUN_OPTS" >> /shared/uadisk/rbpm37/ocf2.out
echo "  pwd is `pwd`" >> /shared/uadisk/rbpm37/ocf2.out
echo "  1. `date`" >> /shared/uadisk/rbpm37/ocf2.out
                su - -s /bin/bash "$JBOSS_USER" \
                        -c "export JAVA_HOME=${JAVA_HOME};\
                            export JBOSS_HOME=${JBOSS_HOME};\
                            $JBOSS_HOME/bin/run.sh $RUN_OPTS" \
                        >> "$CONSOLE" 2>&1 &
echo "  2. `date`" >> /shared/uadisk/rbpm37/ocf2.out
        fi

        while true; do
echo "  3. `date`" >> /shared/uadisk/rbpm37/ocf2.out
                monitor_jboss
                if [ $? = $OCF_SUCCESS ]; then
                        break
                fi
                ocf_log debug "start_jboss[$RESOURCE_NAME]: retry monitor_jboss"
                sleep 3
        done
sleep 5

echo "  4. `date`" >> /shared/uadisk/rbpm37/ocf2.out

        return $OCF_SUCCESS
}

stop_jboss()
{
echo "(Stop `date`)" >> /shared/uadisk/rbpm37/ocf2.out
        ocf_log info "Stopping JBoss[$RESOURCE_NAME]"

        if [ "$JBOSS_USER" = root ]; then
                "$JBOSS_HOME/bin/shutdown.sh" $SHUTDOWN_OPTS -S \
                        >> "$CONSOLE" 2>&1 &
        else
                su - -s /bin/bash "$JBOSS_USER" \
                        -c "export JAVA_HOME=${JAVA_HOME};\n
                            export JBOSS_HOME=${JBOSS_HOME};\n
                            $JBOSS_HOME/bin/shutdown.sh $SHUTDOWN_OPTS -S" \
                        >> "$CONSOLE" 2>&1 &

        fi

        lapse_sec=0
        while pgrep -f "$PSTRING" > /dev/null; do
                sleep 1
                lapse_sec=`expr $lapse_sec + 1`
                ocf_log info "stop_jboss[$RESOURCE_NAME]: stop NORM 
$lapse_sec/$SHUTDOWN_TIMEOUT"
                if [ $lapse_sec -ge $SHUTDOWN_TIMEOUT ]; then
                        break
                fi
        done

        if pgrep -f "$PSTRING" > /dev/null; then 
                ocf_log info "stop_jboss[$RESOURCE_NAME]: output a JVM thread 
dump to $CONSOLE"
                pkill -QUIT -f "$PSTRING"
                lapse_sec=0
                while true; do
                        sleep 1
                        lapse_sec=`expr $lapse_sec + 1`
                        ocf_log info "stop_jboss[$RESOURCE_NAME]: kill jboss by 
SIGTERM ($lapse_sec/$KILL_TIMEOUT)"
                        pkill -TERM -f "$PSTRING"
                        if pgrep -f "$PSTRING" > /dev/null; then
                                if [ $lapse_sec -ge $KILL_TIMEOUT ]; then
                                        break
                                fi
                        else
                                break
                        fi
                done
        fi
        # If the JBoss process hangs, JBoss RA waits $SHUTDOWN_TIMEOUT
        # seconds and tries kill TERM and QUIT for $KILL_TIMEOUT seconds.
        # The stop timeout of RA should be
        # longer than $SHUTDOWN_TIMEOUT + $KILL_TIMEOUT.
        lapse_sec=0
        while pgrep -f "$PSTRING" > /dev/null; do
                sleep 1
                lapse_sec=`expr $lapse_sec + 1`
                ocf_log info "stop_jboss[$RESOURCE_NAME]: kill jboss by SIGKILL 
($lapse_sec/@@@)"
                pkill -KILL -f "$PSTRING"
        done
        return $OCF_SUCCESS
}

status_jboss()
{
echo "(Status `date`)" >> /shared/uadisk/rbpm37/ocf2.out
        if ! pgrep -f "$PSTRING" > /dev/null; then
                echo "JBoss process[$RESOURCE_NAME] is not running."
                return $OCF_NOT_RUNNING
        fi

        if isrunning_jboss; then
                echo "JBoss[$RESOURCE_NAME] is running."
                return $OCF_SUCCESS
        else
                echo "JBoss process[$RESOURCE_NAME] is running."
                echo "But, we can not access JBoss web service."
                return $OCF_NOT_RUNNING
        fi
}


metadata_jboss()
{
    cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="jboss">
<version>1.0</version>

<longdesc lang="en">
Resource script for Jboss. It manages a Jboss instance as an HA resource.
</longdesc>
<shortdesc lang="en">jboss resource agent</shortdesc>

<parameters>

<parameter name="resource_name" unique="1" required="0">
<longdesc lang="en">
The name of the resource. Defaults to the name of the resource
instance.
</longdesc>
<shortdesc>The name of the resource</shortdesc>
<content type="string" default="${OCF_RESOURCE_INSTANCE}" />
</parameter>

<parameter name="console" unique="1" required="0">
<longdesc lang="en">
A destination of the log of jboss run and shutdown script.
</longdesc>
<shortdesc>jboss log path</shortdesc>
<content type="string" default="" />
</parameter>

<parameter name="shutdown_timeout" unique="0" required="0">
<longdesc lang="en">
Timeout for jboss bin/shutdown.sh. We wait for this timeout to
expire, then send the TERM and QUIT signals. Finally, the KILL
signal is used to terminate the jboss process. You should set the
timeout for the stop operation to a value bigger than the sum of
the timeout parameters. See also kill_timeout.
</longdesc>
<shortdesc>shutdown timeout</shortdesc>
<content type="integer" default="5" />
</parameter>

<parameter name="kill_timeout" unique="0" required="0">
<longdesc lang="en">
If bin/shutdown.sh doesn't stop the jboss process, then we send
it TERM and QUIT signals, intermittently and once a second. After
this timeout expires, if the process is still live, we use the
KILL signal. See also shutdown_timeout.
</longdesc>
<shortdesc>stop by signal timeout</shortdesc>
<content type="integer" default="10" />
</parameter>

<parameter name="user" unique="0" required="0">
<longdesc lang="en">
A user name to start a JBoss.
</longdesc>
<shortdesc>A user name to start a resource.</shortdesc>
<content type="string" default="root"/>
</parameter>

<parameter name="statusurl" unique="0" required="0">
<longdesc lang="en">
URL to test in the monitor operation.
</longdesc>
<shortdesc>URL to test in the monitor operation.</shortdesc>
<content type="string" default="http://127.0.0.1:8080";; />
</parameter>

<parameter name="java_home" unique="0" required="0">
<longdesc lang="en">
Home directory of Java.
</longdesc>
<shortdesc>Home directory of Java.</shortdesc>
<content type="string" default=""/>
</parameter>

<parameter name="jboss_home" unique="1" required="1">
<longdesc lang="en">
Home directory of Jboss.
</longdesc>
<shortdesc>Home directory of Jboss.</shortdesc>
<content type="string" default=""/>
</parameter>

<parameter name="pstring" unique="0" required="0">
<longdesc lang="en">
With this string heartbeat matches for the right process to kill.
</longdesc>
<shortdesc>pkill/pgrep search string</shortdesc>
<content type="string" default="java -Dprogram.name=run.sh" />
</parameter>

<parameter name="run_opts" unique="0" required="0">
<longdesc lang="en">
Start options to start Jboss with, defaults are from the Jboss-Doku.
</longdesc>
<shortdesc>options for jboss run.sh</shortdesc>
<content type="string" default="-c default -l lpg4j" />
</parameter>

<parameter name="shutdown_opts" unique="0" required="0">
<longdesc lang="en">
Stop options to stop Jboss with.
</longdesc>
<shortdesc>options for jboss shutdown.sh</shortdesc>
<content type="string" default="-s 127.0.0.1:1099" />
</parameter>

</parameters>

<actions>
<action name="start" timeout="60s" />
<action name="stop" timeout="120s" />
<action name="status" timeout="60" />
<action name="monitor" depth="0" timeout="30s" interval="10s" start-delay="0" />
<action name="meta-data" timeout="5s" />
<action name="validate-all"  timeout="5"/>
</actions>
</resource-agent>
END
        return $OCF_SUCCESS
}

validate_all_jboss()
{
        ocf_log info "validate_all_jboss[$RESOURCE_NAME]"
        return $OCF_SUCCESS
}

COMMAND=$1
RESOURCE_NAME="${OCF_RESKEY_resource_name-${OCF_RESOURCE_INSTANCE}}"
CONSOLE="${OCF_RESKEY_console-/var/log/${RESOURCE_NAME}.log}"
SHUTDOWN_TIMEOUT="${OCF_RESKEY_shutdown_timeout-5}"
KILL_TIMEOUT="${OCF_RESKEY_kill_timeout-10}"
JBOSS_USER="${OCF_RESKEY_user-root}"
STATUSURL="${OCF_RESKEY_statusurl-http://127.0.0.1:8080}";;
PSTRING="${OCF_RESKEY_pstring-java -Dprogram.name=run.sh}"
RUN_OPTS="${OCF_RESKEY_run_opts--c default -l lpg4j}"
SHUTDOWN_OPTS="${OCF_RESKEY_shutdown_opts--s 127.0.0.1:1099}"

# test if these two are set and if directories exist and if the
# required scripts/binaries exist; use OCF_ERR_INSTALLED
JAVA_HOME="${OCF_RESKEY_java_home-${JAVA_HOME}}"
JBOSS_HOME="${OCF_RESKEY_jboss_home}"

if [ ! -d "$JAVA_HOME" -o ! -d "$JBOSS_HOME" ]; then
        case $COMMAND in
                stop)           exit    $OCF_SUCCESS;;
                monitor)        exit    $OCF_NOT_RUNNING;;
                status)         exit    $LSB_STATUS_STOPPED;;
                meta-data)      metadata_jboss;;
        esac
        ocf_log err "JAVA_HOME or JBOSS_HOME does not exist."
        exit $OCF_ERR_INSTALLED
fi

export JAVA_HOME JBOSS_HOME

JAVA=${JAVA_HOME}/bin/java

if [ ! -x "$JAVA" ]; then
        case $COMMAND in
                stop)           exit    $OCF_SUCCESS;;
                monitor)        exit    $OCF_NOT_RUNNING;;
                status)         exit    $LSB_STATUS_STOPPED;;
                meta-data)      metadata_jboss;;
        esac
        ocf_log err "java command does not exist."
        exit $OCF_ERR_INSTALLED
fi

case "$COMMAND" in
        start)
                #ocf_log debug  "[$RESOURCE_NAME] Enter jboss start"
                start_jboss
                func_status=$?
                #ocf_log debug  "[$RESOURCE_NAME] Leave jboss start 
$func_status"
                exit $func_status
                ;;
        stop)
                #ocf_log debug  "[$RESOURCE_NAME] Enter jboss stop"
                stop_jboss
                func_status=$?
                #ocf_log debug  "[$RESOURCE_NAME] Leave jboss stop $func_status"
                exit $func_status
                ;;
        status)
                status_jboss
                exit $?
                ;;
        monitor)
                monitor_jboss
                func_status=$?
                exit $func_status
                ;;
        # move meta-data above, so that it never fails
        meta-data)
                metadata_jboss
                exit $?
                ;;
        validate-all)
                validate_all_jboss
                exit $?
                ;;
        *)
                usage
                ;;
esac


I've modified it slightly from what was posted to add some debugging to figure 
out what's going on. The "echo" statements added are all un-indented to make 
them easy to spot. The script functionality is unchanged.

When I change this resource group from Stopped to Started, I see the Disk, IP, 
and MySQL resource change to Started. I then see the JBoss resource change to 
Started. Then, a few seconds later, it changes to Stopped.

Starting this JBoss application by hand, it takes a while to get to the point 
where it's fully deployed and running. "A while" in this case is anywhere from 
3 to 10 minutes. During that time, JBoss itself can be seen as running, but the 
attempt in the OCF script to verify that the application is working using wget 
will fail, since the provided URL isn't yet available.

It looks like Heartbeat2 is not waiting long enough for the application to 
start. By adding the debugging "echo" statements to the jboss OCF script, I can 
see that first the (start) is called. When (start_jboss) is called, it builds 
and fires off the command line to start JBoss. Then it enters a loop where it 
calls (monitor_jboss). The first couple of times (monitor_jboss) is called, 
JBoss isn't running yet, so (monitor_jboss) returns and the loop continues. 
After a few times through this, (monitor_jboss) sees that JBoss is running and 
starts calling (isrunning_jboss). (isrunning_jboss) uses wget to see if the 
application is running, which it isn't yet since it's only been a few seconds 
and this application takes at least 3 minutes to get going. This loop then 
repeats a few more times. We're still in (start_jboss), calling 
(monitor_jboss), which calls (isrunning_jboss).

This is where things go wrong. After about 20 seconds, Heartbeat2 calls (stop). 
This then goes off and kills the JBoss application that is still in the process 
of starting. After which, the resource is marked as "Stopped" in the CIB.

You can see this in the log written out by the "echo" commands in the jboss OCF 
script:

..................................................................................................................
JBoss   Thu May 6 15:17:35 CDT 2010
OCF_ROOT is /usr/lib/ocf
OCF_RESKEY_resource_name is IDMProv
OCF_RESKEY_console is /shared/uadisk/rbpm37/jboss.log
OCF_RESKEY_kill_timeout is
OCF_RESKEY_user is rbpm
OCF_RESKEY_statusurl is http://131.156.12.4:8080/IDMProv
OCF_RESKEY_java_home is /usr/java/jre1.6.0_18/
OCF_RESKEY_jboss_home is /shared/uadisk/rbpm37/jboss
OCF_RESKEY_pstring is 
OCF_RESKEY_run_opts is -c IDMProv -b 0.0.0.0
OCF_RESKEY_shutdown_opts is -S
pwd is /var/lib/heartbeat/cores/root
/JBoss
(Start Thu May  6 15:17:35 CDT 2010)
(Monitor Thu May  6 15:17:35 CDT 2010)
su - -s /bin/bash rbpm -c export JAVA_HOME=/usr/java/jre1.6.0_18/; export 
JBOSS_HOME=/shared/uadisk/rbpm37/jboss; /shared/uadisk/rbpm37/jboss/bin/run.sh 
-c IDMProv -b 0.0.0.0
  JBOSS_USER is rbpm
  JAVA_HOME is /usr/java/jre1.6.0_18/
  JBOSS_HOME is /shared/uadisk/rbpm37/jboss
  RUN_OPTS is -c IDMProv -b 0.0.0.0
  pwd is /var/lib/heartbeat/cores/root
  1. Thu May  6 15:17:35 CDT 2010
  2. Thu May  6 15:17:35 CDT 2010
  3. Thu May  6 15:17:35 CDT 2010
(Monitor Thu May  6 15:17:35 CDT 2010)
  3. Thu May  6 15:17:38 CDT 2010
(Monitor Thu May  6 15:17:38 CDT 2010)
(Is Running Thu May  6 15:17:38 CDT 2010)
  3. Thu May  6 15:17:42 CDT 2010
(Monitor Thu May  6 15:17:42 CDT 2010)
(Is Running Thu May  6 15:17:42 CDT 2010)
  3. Thu May  6 15:17:45 CDT 2010
(Monitor Thu May  6 15:17:45 CDT 2010)
(Is Running Thu May  6 15:17:45 CDT 2010)
  3. Thu May  6 15:17:48 CDT 2010
(Monitor Thu May  6 15:17:48 CDT 2010)
(Is Running Thu May  6 15:17:48 CDT 2010)
  3. Thu May  6 15:17:51 CDT 2010
(Monitor Thu May  6 15:17:51 CDT 2010)
(Is Running Thu May  6 15:17:51 CDT 2010)
  3. Thu May  6 15:17:54 CDT 2010
(Monitor Thu May  6 15:17:54 CDT 2010)
(Is Running Thu May  6 15:17:54 CDT 2010)
..................................................................................................................
JBoss   Thu May 6 15:17:56 CDT 2010
OCF_ROOT is /usr/lib/ocf
OCF_RESKEY_resource_name is IDMProv
OCF_RESKEY_console is /shared/uadisk/rbpm37/jboss.log
OCF_RESKEY_kill_timeout is
OCF_RESKEY_user is rbpm
OCF_RESKEY_statusurl is http://131.156.12.4:8080/IDMProv
OCF_RESKEY_java_home is /usr/java/jre1.6.0_18/
OCF_RESKEY_jboss_home is /shared/uadisk/rbpm37/jboss
OCF_RESKEY_pstring is 
OCF_RESKEY_run_opts is -c IDMProv -b 0.0.0.0
OCF_RESKEY_shutdown_opts is -S
pwd is /var/lib/heartbeat/cores/root
/JBoss
(Stop Thu May  6 15:17:57 CDT 2010)


Note the times shown for "(Start" and "(Stop". They are 15:17:35  and  
15:17:57. Only 22 seconds have elapsed since Start was called, and now Stop is 
being called.

My understanding is that the OCF script is working correctly. Its job is to 
start the resource, and to wait until it can verify that the resource is 
running before returning to Heartbeat2. The jboss OCF script is doing this, but 
Heartbeat2 isn't waiting for the script command to start  to return. I have 
not, so far, found any way to influence this. As you can see from the 
<operations> block in the CIB, I have been cranking up the values to interval, 
timeout, and start_delay, for the monitor, start and stop operations. None of 
these changes seems to have any effect on what Heartbeat2 actually does. If 
Start hasn't returned successful within about 20 seconds, Heartbeat2 considers 
it to have timed out and kills it.

What am I missing here?



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to