Check out the "Is This init Script LSB Compatible?" appendix in
http://clusterlabs.org/mw/Image:Configuration_Explained.pdf
Until those tests pass, there is no point using it in the cluster - it
will only blow up.
On Wed, Dec 3, 2008 at 18:09, Darren Mansell
<[EMAIL PROTECTED]> wrote:
> Hello everyone.
>
> I am trying to run a 2 node cluster with 1 shared IP for Tomcat. This
> works fine until I set the monitor operation inside the Tomcat resource
> where the CRM keeps trying to restart Tomcat over and over infinitely.
>
> Without the monitor operation in the CIB it won't keep trying to restart
> Tomcat but if I stop it manually it doesn't automatically get started
> again.
>
> I tried the tomcat OCF RA but there are lots of incorrect values hard
> coded in so I edited up an init script to what I thought was LSB
> compatible. This is the init script:
>
>
>
> #!/bin/sh
> # description: Start or stop the Tomcat server
> #
> ### BEGIN INIT INFO
> # Provides: tomcat
> # Required-Start: $network $syslog
> # Required-Stop: $network
> # Default-Start: 3
> # Default-Stop: 0
> # Description: Start or stop the Tomcat server
> ### END INIT INFO
>
> RETVAL=$?
> NAME=tomcat
> export JRE_HOME=/opt/java
> export CATALINA_HOME=/opt/$NAME
> export CATALINA_BASE=/opt/$NAME
> export JAVA_HOME=/opt/java
>
> check_running() {
> NAME=$1
> LINES=`ps -ef | grep java | grep opt | grep $NAME | grep -v grep | wc
> -l `
> [ $LINES -gt 0 ] && echo "yes"
> }
>
> case "$1" in
> 'start')
> RUNNING=`check_running $NAME`
> [ "$RUNNING" ] && exit 0
> if [ -f $CATALINA_HOME/bin/startup.sh ];
> then
> echo $"Starting Tomcat"
> $CATALINA_HOME/bin/startup.sh
> fi
> ;;
> 'stop')
> RUNNING=`check_running $NAME`
> [ ! "$RUNNING" ] && exit 0
> if [ -f $CATALINA_HOME/bin/shutdown.sh ];
> then
> echo $"Stopping Tomcat"
> $CATALINA_HOME/bin/shutdown.sh
> fi
> ;;
> 'restart')
> $0 stop
> sleep 15
> $0 start
> ;;
> 'status')
> RUNNING=`check_running $NAME`
> [ "$RUNNING" ] && exit 0 || exit 1;;
> *)
> echo
> echo $"Usage: $0 {start|stop}"
> echo
> exit 1;;
>
> esac
> exit $RETVAL
>
>
>
>
>
> This is my cib.xml
>
>
>
>
>
> <cib generated="true" admin_epoch="0" have_quorum="true" ignore_dtd="false"
> num_peers="2" cib_feature_revision="1.3" crm_feature_set="2.0" epoch="125"
> num_updates="82" cib-last-written="Wed Dec 3 16:45:56 2008"
> ccm_transition="2" dc_uuid="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <attributes>
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
> </attributes>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="7e9a5233-d24c-441f-9f14-03352172f08b" uname="hs-node2"
> type="normal"/>
> <node id="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb" uname="hs-node1"
> type="normal"/>
> </nodes>
> <resources>
> <clone id="tomcat">
> <instance_attributes id="5908d3eb-7d48-4c7d-bcca-9020f8eadc87">
> <attributes>
> <nvpair name="clone_max" value="2"
> id="19a0d76d-9697-4d19-8990-0f098d299a4f"/>
> <nvpair name="clone_node_max" value="1"
> id="de765b64-ece4-4c19-9659-13e20b60d9bb"/>
> </attributes>
> </instance_attributes>
> <group id="tomcat_group">
> <primitive id="ip_1" class="ocf" type="IPaddr" provider="heartbeat">
> <instance_attributes id="e79760a4-c715-477a-a4b7-85eab9bf9ae9">
> <attributes>
> <nvpair name="ip" value="2.21.2.5"
> id="07540941-f4f8-4bd0-ac78-7d62f212145a"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive id="tomcat_1" class="lsb" type="tomcat"
> provider="heartbeat">
> <operations>
> <op id="monitor_tomcat" interval="120s" name="monitor"
> timeout="60s"/>
> </operations>
> </primitive>
> </group>
> </clone>
> </resources>
> <constraints/>
> </configuration>
>
>
>
>
> This is the ha.cf:
>
>
>
>
> udpport 694
> autojoin none
> crm true
> ucast eth0 2.21.2.4
> ucast eth0 2.21.2.3
> node hs-node1
> node hs-node2
> respawn root /sbin/evmsd
> apiauth evms uid=hacluster,root
>
>
>
>
>
> This is what crm_mon says:
>
>
>
>
> ============
> Last updated: Wed Dec 3 17:26:47 2008
> Current DC: hs-node1 (ae4489bf-2c5d-4cfd-bf81-5e25b11932eb)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> Node: hs-node2 (7e9a5233-d24c-441f-9f14-03352172f08b): online
> Node: hs-node1 (ae4489bf-2c5d-4cfd-bf81-5e25b11932eb): online
>
> Clone Set: tomcat
> Resource Group: tomcat_group:0
> ip_1:0 (ocf::heartbeat:IPaddr): Started hs-node2
> tomcat_1:0 (lsb:tomcat): Started hs-node2 FAILED
> Resource Group: tomcat_group:1
> ip_1:1 (ocf::heartbeat:IPaddr): Started hs-node1
> tomcat_1:1 (lsb:tomcat): Stopped
>
> Failed actions:
> tomcat_1:0_monitor_120000 (node=hs-node2, call=809, rc=7): complete
>
>
>
>
>
>
> It was working but suddenly stopped and I have no idea why. If anyone could
> provide any pointers that would be great. I'm using:
>
> SLES 10 SP2
> Heartbeat 2.1.3
>
> Thanks
>
> Darren Mansell
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems