Hello everyone. I am trying to run a 2 node cluster with 1 shared IP for Tomcat. This works fine until I set the monitor operation inside the Tomcat resource where the CRM keeps trying to restart Tomcat over and over infinitely.
Without the monitor operation in the CIB it won't keep trying to restart
Tomcat but if I stop it manually it doesn't automatically get started
again.
I tried the tomcat OCF RA but there are lots of incorrect values hard
coded in so I edited up an init script to what I thought was LSB
compatible. This is the init script:
#!/bin/sh
# description: Start or stop the Tomcat server
#
### BEGIN INIT INFO
# Provides: tomcat
# Required-Start: $network $syslog
# Required-Stop: $network
# Default-Start: 3
# Default-Stop: 0
# Description: Start or stop the Tomcat server
### END INIT INFO
RETVAL=$?
NAME=tomcat
export JRE_HOME=/opt/java
export CATALINA_HOME=/opt/$NAME
export CATALINA_BASE=/opt/$NAME
export JAVA_HOME=/opt/java
check_running() {
NAME=$1
LINES=`ps -ef | grep java | grep opt | grep $NAME | grep -v grep | wc
-l `
[ $LINES -gt 0 ] && echo "yes"
}
case "$1" in
'start')
RUNNING=`check_running $NAME`
[ "$RUNNING" ] && exit 0
if [ -f $CATALINA_HOME/bin/startup.sh ];
then
echo $"Starting Tomcat"
$CATALINA_HOME/bin/startup.sh
fi
;;
'stop')
RUNNING=`check_running $NAME`
[ ! "$RUNNING" ] && exit 0
if [ -f $CATALINA_HOME/bin/shutdown.sh ];
then
echo $"Stopping Tomcat"
$CATALINA_HOME/bin/shutdown.sh
fi
;;
'restart')
$0 stop
sleep 15
$0 start
;;
'status')
RUNNING=`check_running $NAME`
[ "$RUNNING" ] && exit 0 || exit 1;;
*)
echo
echo $"Usage: $0 {start|stop}"
echo
exit 1;;
esac
exit $RETVAL
This is my cib.xml
<cib generated="true" admin_epoch="0" have_quorum="true" ignore_dtd="false"
num_peers="2" cib_feature_revision="1.3" crm_feature_set="2.0" epoch="125"
num_updates="82" cib-last-written="Wed Dec 3 16:45:56 2008" ccm_transition="2"
dc_uuid="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node id="7e9a5233-d24c-441f-9f14-03352172f08b" uname="hs-node2"
type="normal"/>
<node id="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb" uname="hs-node1"
type="normal"/>
</nodes>
<resources>
<clone id="tomcat">
<instance_attributes id="5908d3eb-7d48-4c7d-bcca-9020f8eadc87">
<attributes>
<nvpair name="clone_max" value="2"
id="19a0d76d-9697-4d19-8990-0f098d299a4f"/>
<nvpair name="clone_node_max" value="1"
id="de765b64-ece4-4c19-9659-13e20b60d9bb"/>
</attributes>
</instance_attributes>
<group id="tomcat_group">
<primitive id="ip_1" class="ocf" type="IPaddr" provider="heartbeat">
<instance_attributes id="e79760a4-c715-477a-a4b7-85eab9bf9ae9">
<attributes>
<nvpair name="ip" value="2.21.2.5"
id="07540941-f4f8-4bd0-ac78-7d62f212145a"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="tomcat_1" class="lsb" type="tomcat"
provider="heartbeat">
<operations>
<op id="monitor_tomcat" interval="120s" name="monitor"
timeout="60s"/>
</operations>
</primitive>
</group>
</clone>
</resources>
<constraints/>
</configuration>
This is the ha.cf:
udpport 694
autojoin none
crm true
ucast eth0 2.21.2.4
ucast eth0 2.21.2.3
node hs-node1
node hs-node2
respawn root /sbin/evmsd
apiauth evms uid=hacluster,root
This is what crm_mon says:
============
Last updated: Wed Dec 3 17:26:47 2008
Current DC: hs-node1 (ae4489bf-2c5d-4cfd-bf81-5e25b11932eb)
2 Nodes configured.
1 Resources configured.
============
Node: hs-node2 (7e9a5233-d24c-441f-9f14-03352172f08b): online
Node: hs-node1 (ae4489bf-2c5d-4cfd-bf81-5e25b11932eb): online
Clone Set: tomcat
Resource Group: tomcat_group:0
ip_1:0 (ocf::heartbeat:IPaddr): Started hs-node2
tomcat_1:0 (lsb:tomcat): Started hs-node2 FAILED
Resource Group: tomcat_group:1
ip_1:1 (ocf::heartbeat:IPaddr): Started hs-node1
tomcat_1:1 (lsb:tomcat): Stopped
Failed actions:
tomcat_1:0_monitor_120000 (node=hs-node2, call=809, rc=7): complete
It was working but suddenly stopped and I have no idea why. If anyone could
provide any pointers that would be great. I'm using:
SLES 10 SP2
Heartbeat 2.1.3
Thanks
Darren Mansell
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
