I'm not entirely new to Heartbeat2, but I've run in to something here that I have not been able to figure out. What I'm trying to do is create a JBoss resource, as part of a resource group (disk, ip, mysql, jboss), for an application. I have the disk, ip, and MySQL resources working, it's just the JBoss resource that's proving to be more difficult than expected. This particular JBoss application takes a while to get fully started, which is where I think I'm running in to trouble. More on that below.
The servers are SLES10, and I'm using Heartbeat2: Linux sles10-3 2.6.16.60-0.34-default #1 Fri Jan 16 14:59:01 UTC 2009 i686 i686 i386 GNU/Linux heartbeat-2.1.4-0.11 I've defined my JBoss application resource: <primitive class="ocf" type="jboss" provider="heartbeat" is_managed="true" id="JBoss_4"> <instance_attributes id="JBoss_4_instance_attrs"> <attributes> <nvpair name="resource_name" value="IDMProv" id="4609063b-c767-4956-a2f9-f44f46b634a9"/> <nvpair name="console" value="/shared/uadisk/rbpm37/jboss.log" id="a78e869b-2b00-474c-987e-5919c1ce80e7"/> <nvpair name="shutdown_timeout" value="60" id="16812ae8-4f3c-46df-bdbe-24f7e0fcd557"/> <nvpair name="user" value="rbpm" id="f8ccf1ed-c701-4dc0-b8e5-d66c779a8b9f"/> <nvpair name="statusurl" value="http://131.156.12.4:8080/IDMProv" id="11d32111-4e2b-4224-99d2-20af4eb43eb8"/> <nvpair name="java_home" value="/usr/java/jre1.6.0_18/" id="2881e29c-4b87-4094-bffc-d0d9e9682e16"/> <nvpair name="jboss_home" value="/shared/uadisk/rbpm37/jboss" id="d21e26f2-3e2f-4a98-9ecd-8f934b844434"/> <nvpair name="run_opts" value="-c IDMProv -b 0.0.0.0" id="681efa45-d7f3-4ef0-b90d-5d652b6480d6"/> <nvpair name="shutdown_opts" value="-S" id="f10343d3-28cc-4497-b234-e4678dda5818"/> </attributes> </instance_attributes> <operations> <op name="monitor" interval="10s" timeout="600s" start_delay="600s" id="317075e9-dabb-4923-9129-be16882f94a4"/> <op name="start" interval="900s" timeout="600s" start_delay="10s" id="ab407ca5-78e4-48e9-bee0-f70f64d011e4"/> <op name="stop" interval="10s" timeout="600s" start_delay="10s" id="33d5a72d-105c-4da0-99cb-b25de520a5ae"/> </operations> </primitive> This has gone through many iterations over the last few days. This is what's currently in the CIB. This particular Heartbeat2 version didn't include a JBoss OCF script, but I obtained this one from the list archives: #!/bin/sh # # Description: Manages a Jboss Server as an OCF High-Availability # resource under Heartbeat/LinuxHA control # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA # 02110-1301, USA. # # Copyright (c) 2009 Bauer Systems KG / Stefan Schluppeck # ######################################################################################################################################### # OCF parameters: # OCF_RESKEY_resource_name - The name of the resource. Default is ${OCF_RESOURCE_INSTANCE} # why not let the RA log through lrmd? # 2009/09/09 Nakahira: # jboss_console is used to record output of the "run.sh". # The log of "Run.sh" should not be output to ha-log because it is so annoying. # OCF_RESKEY_console - A destination of the log of jboss run and shutdown script. Default is /var/log/${OCF_RESKEY_resource_name}.log # OCF_RESKEY_shutdown_timeout - Time-out at the time of the stop. Default is 5 # OCF_RESKEY_kill_timeout - The re-try number of times awaiting a stop. Default is 10 # OCF_RESKEY_user - A user name to start a JBoss. Default is root # OCF_RESKEY_statusurl - URL for state confirmation. Default is http://127.0.0.1:8080 # OCF_RESKEY_java_home - Home directory of the Java. Default is ${JAVA_HOME} # OCF_RESKEY_jboss_home - Home directory of Jboss. Default is None # is it possible to devise this string from options? I'm afraid # that allowing users to set this could be error prone. # 2009/09/09 Nakahira: # It is difficult to set it automatically because jboss_pstring # greatly depends on the environment. At any rate, system architect # should note that pstring doesn't influence other processes. # OCF_RESKEY_pstring - String Jboss will found in procceslist. Default is "java -Dprogram.name=run.sh" # OCF_RESKEY_run_opts - Options for jboss to run. Default is "-c default -l lpg4j" # OCF_RESKEY_shutdown_opts - Options for jboss to shutdown. Default is "-s 127.0.0.1:1099" ######################################################################################################################################### ################################################################################################################## # OCF_ROOT="/usr/lib/ocf" # OCF_RESKEY_resource_name # OCF_RESKEY_console # OCF_RESKEY_shutdown_timeout # OCF_RESKEY_kill_timeout # OCF_RESKEY_user # OCF_RESKEY_statusurl # OCF_RESKEY_java_home # OCF_RESKEY_jboss_home # OCF_RESKEY_pstring # OCF_RESKEY_run_opts # OCF_RESKEY_shutdown_opts # OCF_ROOT="/usr/lib/ocf" # OCF_RESKEY_resource_name="IDMProv" # OCF_RESOURCE_INSTANCE="1" # OCF_RESKEY_console="/shared/uadisk/rbpm37/jboss.log" # OCF_RESKEY_user="rbpm" # OCF_RESKEY_statusurl="http://131.156.12.4:8080/IDMProv" # OCF_RESKEY_java_home="/usr/java/jre1.6.0_18/" # OCF_RESKEY_jboss_home="/shared/uadisk/rbpm37/jboss" # OCF_RESKEY_run_opts="-c IDMProv -b 0.0.0.0" # OCF_RESKEY_shutdown_opts="-S" echo .................................................................................................................. >> /shared/uadisk/rbpm37/ocf2.out echo JBoss " " `date` >> /shared/uadisk/rbpm37/ocf2.out echo OCF_ROOT is ${OCF_ROOT} >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_resource_name is ${OCF_RESKEY_resource_name} >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_console is ${OCF_RESKEY_console} >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_kill_timeout is ${OCF_RESKEY_kill_timeout} >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_user is ${OCF_RESKEY_user} >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_statusurl is ${OCF_RESKEY_statusurl} >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_java_home is ${OCF_RESKEY_java_home} >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_jboss_home is ${OCF_RESKEY_jboss_home} >> /shared/uadisk/rbpm37/ocf2.out echo "OCF_RESKEY_pstring is ${OCF_RESKEY_pstring}" >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_run_opts is ${OCF_RESKEY_run_opts} >> /shared/uadisk/rbpm37/ocf2.out echo OCF_RESKEY_shutdown_opts is ${OCF_RESKEY_shutdown_opts} >> /shared/uadisk/rbpm37/ocf2.out echo pwd is `pwd` >> /shared/uadisk/rbpm37/ocf2.out echo /JBoss >> /shared/uadisk/rbpm37/ocf2.out ################################################################################################################## . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs usage() { cat <<-! usage: $0 action action: start start jboss stop stop the jboss status return the status of jboss, run or down monitor return TRUE if the jboss appears to be working. You have to have installed $WGETNAME for this to work. meta-data show meta data message validate-all validate the instance parameters ! return $OCF_ERR_ARGS } isrunning_jboss() { echo "(Is Running `date`)" >> /shared/uadisk/rbpm37/ocf2.out if wget -O /dev/null $STATUSURL 2>/dev/null; then return $OCF_SUCCESS fi # JBoss service error return $OCF_ERR_GENERIC } monitor_jboss() { echo "(Monitor `date`)" >> /shared/uadisk/rbpm37/ocf2.out if ! pgrep -f "$PSTRING" > /dev/null; then return $OCF_NOT_RUNNING fi isrunning_jboss } start_jboss() { echo "(Start `date`)" >> /shared/uadisk/rbpm37/ocf2.out monitor_jboss if [ $? = $OCF_SUCCESS ]; then return $OCF_SUCCESS fi ocf_log info "Starting JBoss[$RESOURCE_NAME]" if [ "$JBOSS_USER" = root ]; then "$JBOSS_HOME/bin/run.sh" $RUN_OPTS \ >> "$CONSOLE" 2>&1 & else echo "su - -s /bin/bash $JBOSS_USER -c export JAVA_HOME=$JAVA_HOME; export JBOSS_HOME=$JBOSS_HOME; $JBOSS_HOME/bin/run.sh $RUN_OPTS" >> /shared/uadisk/rbpm37/ocf2.out echo " JBOSS_USER is $JBOSS_USER" >> /shared/uadisk/rbpm37/ocf2.out echo " JAVA_HOME is $JAVA_HOME" >> /shared/uadisk/rbpm37/ocf2.out echo " JBOSS_HOME is $JBOSS_HOME" >> /shared/uadisk/rbpm37/ocf2.out echo " RUN_OPTS is $RUN_OPTS" >> /shared/uadisk/rbpm37/ocf2.out echo " pwd is `pwd`" >> /shared/uadisk/rbpm37/ocf2.out echo " 1. `date`" >> /shared/uadisk/rbpm37/ocf2.out su - -s /bin/bash "$JBOSS_USER" \ -c "export JAVA_HOME=${JAVA_HOME};\ export JBOSS_HOME=${JBOSS_HOME};\ $JBOSS_HOME/bin/run.sh $RUN_OPTS" \ >> "$CONSOLE" 2>&1 & echo " 2. `date`" >> /shared/uadisk/rbpm37/ocf2.out fi while true; do echo " 3. `date`" >> /shared/uadisk/rbpm37/ocf2.out monitor_jboss if [ $? = $OCF_SUCCESS ]; then break fi ocf_log debug "start_jboss[$RESOURCE_NAME]: retry monitor_jboss" sleep 3 done sleep 5 echo " 4. `date`" >> /shared/uadisk/rbpm37/ocf2.out return $OCF_SUCCESS } stop_jboss() { echo "(Stop `date`)" >> /shared/uadisk/rbpm37/ocf2.out ocf_log info "Stopping JBoss[$RESOURCE_NAME]" if [ "$JBOSS_USER" = root ]; then "$JBOSS_HOME/bin/shutdown.sh" $SHUTDOWN_OPTS -S \ >> "$CONSOLE" 2>&1 & else su - -s /bin/bash "$JBOSS_USER" \ -c "export JAVA_HOME=${JAVA_HOME};\n export JBOSS_HOME=${JBOSS_HOME};\n $JBOSS_HOME/bin/shutdown.sh $SHUTDOWN_OPTS -S" \ >> "$CONSOLE" 2>&1 & fi lapse_sec=0 while pgrep -f "$PSTRING" > /dev/null; do sleep 1 lapse_sec=`expr $lapse_sec + 1` ocf_log info "stop_jboss[$RESOURCE_NAME]: stop NORM $lapse_sec/$SHUTDOWN_TIMEOUT" if [ $lapse_sec -ge $SHUTDOWN_TIMEOUT ]; then break fi done if pgrep -f "$PSTRING" > /dev/null; then ocf_log info "stop_jboss[$RESOURCE_NAME]: output a JVM thread dump to $CONSOLE" pkill -QUIT -f "$PSTRING" lapse_sec=0 while true; do sleep 1 lapse_sec=`expr $lapse_sec + 1` ocf_log info "stop_jboss[$RESOURCE_NAME]: kill jboss by SIGTERM ($lapse_sec/$KILL_TIMEOUT)" pkill -TERM -f "$PSTRING" if pgrep -f "$PSTRING" > /dev/null; then if [ $lapse_sec -ge $KILL_TIMEOUT ]; then break fi else break fi done fi # If the JBoss process hangs, JBoss RA waits $SHUTDOWN_TIMEOUT # seconds and tries kill TERM and QUIT for $KILL_TIMEOUT seconds. # The stop timeout of RA should be # longer than $SHUTDOWN_TIMEOUT + $KILL_TIMEOUT. lapse_sec=0 while pgrep -f "$PSTRING" > /dev/null; do sleep 1 lapse_sec=`expr $lapse_sec + 1` ocf_log info "stop_jboss[$RESOURCE_NAME]: kill jboss by SIGKILL ($lapse_sec/@@@)" pkill -KILL -f "$PSTRING" done return $OCF_SUCCESS } status_jboss() { echo "(Status `date`)" >> /shared/uadisk/rbpm37/ocf2.out if ! pgrep -f "$PSTRING" > /dev/null; then echo "JBoss process[$RESOURCE_NAME] is not running." return $OCF_NOT_RUNNING fi if isrunning_jboss; then echo "JBoss[$RESOURCE_NAME] is running." return $OCF_SUCCESS else echo "JBoss process[$RESOURCE_NAME] is running." echo "But, we can not access JBoss web service." return $OCF_NOT_RUNNING fi } metadata_jboss() { cat <<END <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> <resource-agent name="jboss"> <version>1.0</version> <longdesc lang="en"> Resource script for Jboss. It manages a Jboss instance as an HA resource. </longdesc> <shortdesc lang="en">jboss resource agent</shortdesc> <parameters> <parameter name="resource_name" unique="1" required="0"> <longdesc lang="en"> The name of the resource. Defaults to the name of the resource instance. </longdesc> <shortdesc>The name of the resource</shortdesc> <content type="string" default="${OCF_RESOURCE_INSTANCE}" /> </parameter> <parameter name="console" unique="1" required="0"> <longdesc lang="en"> A destination of the log of jboss run and shutdown script. </longdesc> <shortdesc>jboss log path</shortdesc> <content type="string" default="" /> </parameter> <parameter name="shutdown_timeout" unique="0" required="0"> <longdesc lang="en"> Timeout for jboss bin/shutdown.sh. We wait for this timeout to expire, then send the TERM and QUIT signals. Finally, the KILL signal is used to terminate the jboss process. You should set the timeout for the stop operation to a value bigger than the sum of the timeout parameters. See also kill_timeout. </longdesc> <shortdesc>shutdown timeout</shortdesc> <content type="integer" default="5" /> </parameter> <parameter name="kill_timeout" unique="0" required="0"> <longdesc lang="en"> If bin/shutdown.sh doesn't stop the jboss process, then we send it TERM and QUIT signals, intermittently and once a second. After this timeout expires, if the process is still live, we use the KILL signal. See also shutdown_timeout. </longdesc> <shortdesc>stop by signal timeout</shortdesc> <content type="integer" default="10" /> </parameter> <parameter name="user" unique="0" required="0"> <longdesc lang="en"> A user name to start a JBoss. </longdesc> <shortdesc>A user name to start a resource.</shortdesc> <content type="string" default="root"/> </parameter> <parameter name="statusurl" unique="0" required="0"> <longdesc lang="en"> URL to test in the monitor operation. </longdesc> <shortdesc>URL to test in the monitor operation.</shortdesc> <content type="string" default="http://127.0.0.1:8080"; /> </parameter> <parameter name="java_home" unique="0" required="0"> <longdesc lang="en"> Home directory of Java. </longdesc> <shortdesc>Home directory of Java.</shortdesc> <content type="string" default=""/> </parameter> <parameter name="jboss_home" unique="1" required="1"> <longdesc lang="en"> Home directory of Jboss. </longdesc> <shortdesc>Home directory of Jboss.</shortdesc> <content type="string" default=""/> </parameter> <parameter name="pstring" unique="0" required="0"> <longdesc lang="en"> With this string heartbeat matches for the right process to kill. </longdesc> <shortdesc>pkill/pgrep search string</shortdesc> <content type="string" default="java -Dprogram.name=run.sh" /> </parameter> <parameter name="run_opts" unique="0" required="0"> <longdesc lang="en"> Start options to start Jboss with, defaults are from the Jboss-Doku. </longdesc> <shortdesc>options for jboss run.sh</shortdesc> <content type="string" default="-c default -l lpg4j" /> </parameter> <parameter name="shutdown_opts" unique="0" required="0"> <longdesc lang="en"> Stop options to stop Jboss with. </longdesc> <shortdesc>options for jboss shutdown.sh</shortdesc> <content type="string" default="-s 127.0.0.1:1099" /> </parameter> </parameters> <actions> <action name="start" timeout="60s" /> <action name="stop" timeout="120s" /> <action name="status" timeout="60" /> <action name="monitor" depth="0" timeout="30s" interval="10s" start-delay="0" /> <action name="meta-data" timeout="5s" /> <action name="validate-all" timeout="5"/> </actions> </resource-agent> END return $OCF_SUCCESS } validate_all_jboss() { ocf_log info "validate_all_jboss[$RESOURCE_NAME]" return $OCF_SUCCESS } COMMAND=$1 RESOURCE_NAME="${OCF_RESKEY_resource_name-${OCF_RESOURCE_INSTANCE}}" CONSOLE="${OCF_RESKEY_console-/var/log/${RESOURCE_NAME}.log}" SHUTDOWN_TIMEOUT="${OCF_RESKEY_shutdown_timeout-5}" KILL_TIMEOUT="${OCF_RESKEY_kill_timeout-10}" JBOSS_USER="${OCF_RESKEY_user-root}" STATUSURL="${OCF_RESKEY_statusurl-http://127.0.0.1:8080}"; PSTRING="${OCF_RESKEY_pstring-java -Dprogram.name=run.sh}" RUN_OPTS="${OCF_RESKEY_run_opts--c default -l lpg4j}" SHUTDOWN_OPTS="${OCF_RESKEY_shutdown_opts--s 127.0.0.1:1099}" # test if these two are set and if directories exist and if the # required scripts/binaries exist; use OCF_ERR_INSTALLED JAVA_HOME="${OCF_RESKEY_java_home-${JAVA_HOME}}" JBOSS_HOME="${OCF_RESKEY_jboss_home}" if [ ! -d "$JAVA_HOME" -o ! -d "$JBOSS_HOME" ]; then case $COMMAND in stop) exit $OCF_SUCCESS;; monitor) exit $OCF_NOT_RUNNING;; status) exit $LSB_STATUS_STOPPED;; meta-data) metadata_jboss;; esac ocf_log err "JAVA_HOME or JBOSS_HOME does not exist." exit $OCF_ERR_INSTALLED fi export JAVA_HOME JBOSS_HOME JAVA=${JAVA_HOME}/bin/java if [ ! -x "$JAVA" ]; then case $COMMAND in stop) exit $OCF_SUCCESS;; monitor) exit $OCF_NOT_RUNNING;; status) exit $LSB_STATUS_STOPPED;; meta-data) metadata_jboss;; esac ocf_log err "java command does not exist." exit $OCF_ERR_INSTALLED fi case "$COMMAND" in start) #ocf_log debug "[$RESOURCE_NAME] Enter jboss start" start_jboss func_status=$? #ocf_log debug "[$RESOURCE_NAME] Leave jboss start $func_status" exit $func_status ;; stop) #ocf_log debug "[$RESOURCE_NAME] Enter jboss stop" stop_jboss func_status=$? #ocf_log debug "[$RESOURCE_NAME] Leave jboss stop $func_status" exit $func_status ;; status) status_jboss exit $? ;; monitor) monitor_jboss func_status=$? exit $func_status ;; # move meta-data above, so that it never fails meta-data) metadata_jboss exit $? ;; validate-all) validate_all_jboss exit $? ;; *) usage ;; esac I've modified it slightly from what was posted to add some debugging to figure out what's going on. The "echo" statements added are all un-indented to make them easy to spot. The script functionality is unchanged. When I change this resource group from Stopped to Started, I see the Disk, IP, and MySQL resource change to Started. I then see the JBoss resource change to Started. Then, a few seconds later, it changes to Stopped. Starting this JBoss application by hand, it takes a while to get to the point where it's fully deployed and running. "A while" in this case is anywhere from 3 to 10 minutes. During that time, JBoss itself can be seen as running, but the attempt in the OCF script to verify that the application is working using wget will fail, since the provided URL isn't yet available. It looks like Heartbeat2 is not waiting long enough for the application to start. By adding the debugging "echo" statements to the jboss OCF script, I can see that first the (start) is called. When (start_jboss) is called, it builds and fires off the command line to start JBoss. Then it enters a loop where it calls (monitor_jboss). The first couple of times (monitor_jboss) is called, JBoss isn't running yet, so (monitor_jboss) returns and the loop continues. After a few times through this, (monitor_jboss) sees that JBoss is running and starts calling (isrunning_jboss). (isrunning_jboss) uses wget to see if the application is running, which it isn't yet since it's only been a few seconds and this application takes at least 3 minutes to get going. This loop then repeats a few more times. We're still in (start_jboss), calling (monitor_jboss), which calls (isrunning_jboss). This is where things go wrong. After about 20 seconds, Heartbeat2 calls (stop). This then goes off and kills the JBoss application that is still in the process of starting. After which, the resource is marked as "Stopped" in the CIB. You can see this in the log written out by the "echo" commands in the jboss OCF script: .................................................................................................................. JBoss Thu May 6 15:17:35 CDT 2010 OCF_ROOT is /usr/lib/ocf OCF_RESKEY_resource_name is IDMProv OCF_RESKEY_console is /shared/uadisk/rbpm37/jboss.log OCF_RESKEY_kill_timeout is OCF_RESKEY_user is rbpm OCF_RESKEY_statusurl is http://131.156.12.4:8080/IDMProv OCF_RESKEY_java_home is /usr/java/jre1.6.0_18/ OCF_RESKEY_jboss_home is /shared/uadisk/rbpm37/jboss OCF_RESKEY_pstring is OCF_RESKEY_run_opts is -c IDMProv -b 0.0.0.0 OCF_RESKEY_shutdown_opts is -S pwd is /var/lib/heartbeat/cores/root /JBoss (Start Thu May 6 15:17:35 CDT 2010) (Monitor Thu May 6 15:17:35 CDT 2010) su - -s /bin/bash rbpm -c export JAVA_HOME=/usr/java/jre1.6.0_18/; export JBOSS_HOME=/shared/uadisk/rbpm37/jboss; /shared/uadisk/rbpm37/jboss/bin/run.sh -c IDMProv -b 0.0.0.0 JBOSS_USER is rbpm JAVA_HOME is /usr/java/jre1.6.0_18/ JBOSS_HOME is /shared/uadisk/rbpm37/jboss RUN_OPTS is -c IDMProv -b 0.0.0.0 pwd is /var/lib/heartbeat/cores/root 1. Thu May 6 15:17:35 CDT 2010 2. Thu May 6 15:17:35 CDT 2010 3. Thu May 6 15:17:35 CDT 2010 (Monitor Thu May 6 15:17:35 CDT 2010) 3. Thu May 6 15:17:38 CDT 2010 (Monitor Thu May 6 15:17:38 CDT 2010) (Is Running Thu May 6 15:17:38 CDT 2010) 3. Thu May 6 15:17:42 CDT 2010 (Monitor Thu May 6 15:17:42 CDT 2010) (Is Running Thu May 6 15:17:42 CDT 2010) 3. Thu May 6 15:17:45 CDT 2010 (Monitor Thu May 6 15:17:45 CDT 2010) (Is Running Thu May 6 15:17:45 CDT 2010) 3. Thu May 6 15:17:48 CDT 2010 (Monitor Thu May 6 15:17:48 CDT 2010) (Is Running Thu May 6 15:17:48 CDT 2010) 3. Thu May 6 15:17:51 CDT 2010 (Monitor Thu May 6 15:17:51 CDT 2010) (Is Running Thu May 6 15:17:51 CDT 2010) 3. Thu May 6 15:17:54 CDT 2010 (Monitor Thu May 6 15:17:54 CDT 2010) (Is Running Thu May 6 15:17:54 CDT 2010) .................................................................................................................. JBoss Thu May 6 15:17:56 CDT 2010 OCF_ROOT is /usr/lib/ocf OCF_RESKEY_resource_name is IDMProv OCF_RESKEY_console is /shared/uadisk/rbpm37/jboss.log OCF_RESKEY_kill_timeout is OCF_RESKEY_user is rbpm OCF_RESKEY_statusurl is http://131.156.12.4:8080/IDMProv OCF_RESKEY_java_home is /usr/java/jre1.6.0_18/ OCF_RESKEY_jboss_home is /shared/uadisk/rbpm37/jboss OCF_RESKEY_pstring is OCF_RESKEY_run_opts is -c IDMProv -b 0.0.0.0 OCF_RESKEY_shutdown_opts is -S pwd is /var/lib/heartbeat/cores/root /JBoss (Stop Thu May 6 15:17:57 CDT 2010) Note the times shown for "(Start" and "(Stop". They are 15:17:35 and 15:17:57. Only 22 seconds have elapsed since Start was called, and now Stop is being called. My understanding is that the OCF script is working correctly. Its job is to start the resource, and to wait until it can verify that the resource is running before returning to Heartbeat2. The jboss OCF script is doing this, but Heartbeat2 isn't waiting for the script command to start to return. I have not, so far, found any way to influence this. As you can see from the <operations> block in the CIB, I have been cranking up the values to interval, timeout, and start_delay, for the monitor, start and stop operations. None of these changes seems to have any effect on what Heartbeat2 actually does. If Start hasn't returned successful within about 20 seconds, Heartbeat2 considers it to have timed out and kills it. What am I missing here? _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
