Hi, On Fri, Jul 09, 2010 at 01:04:09PM +0100, Brett Delle Grazie wrote: > Hi, > > Yes checked both logs: > > Catalina.out specifies normal (successful) Tomcat startup. > > tc-1.log (log from backgrounded start/stop operations): > > Doesn't give anything unusual: > 2010/07/09 09:42:13: start =========================== > 2010/07/09 10:20:46: stop ########################### > 2010/07/09 10:27:35: start =========================== > 2010/07/09 12:50:20: stop ########################### > 2010/07/09 12:50:26: start =========================== > > Yes, I realise these are from later runs but the same thing is still > occurring. > > Is it possible that the start operation doesn't close of one of > the file descriptors and is left 'hanging' - even though > it exits (at least from the perspective of pacemaker)? > > Would this explain the ownership of 'init' by the 'tomcat > start' process instead of by pacemaker?
No. lrmd kills the process if it doesn't exit within the timeout. By "ownership" I guess you mean the parent process. The RA process (/usr/lib/ocf/.../tomcat start) is a child of the lrmd. init can become its parent only if lrmd exits. What is the timeout for that start operation set to? Does the process remain even after that timeout? What happens to lrmd? > > > > heartbeat-libs-3.0.3-1 Where does that come from? Normally, you should have cluster-libs. Perhaps you need to update. Thanks, Dejan > Thanks, > > Best Regards, > > Brett > > > -----Original Message----- > From: Dejan Muhamedagic [mailto:[email protected]] > Sent: Fri 09/07/2010 12:54 > To: General Linux-HA mailing list > Subject: Re: [Linux-HA] Tomcat Resource Agent always leaves dead process on > stop or restart > > Hi, > > On Fri, Jul 09, 2010 at 12:29:40PM +0100, Brett Delle Grazie wrote: > > > > Hi, > > > > Now we come to the fun part... > > > > When I first started looking at this I thought the monitor code in the > > agent was wrong: > > > > ############################################################################ > > # Check tomcat process and service availability > > monitor_tomcat() > > { > > isalive_tomcat || > > return $OCF_NOT_RUNNING > > isrunning_tomcat || > > return $OCF_NOT_RUNNING > > return $OCF_SUCCESS > > } > > > > Both pgrep and wget return 0 if successful, thus so do isalive_tomcat and > > isrunning_tomcat. > > However this appears correct. > > > > So I'm _really_ confused about why this is not exiting. > > > > Any ideas? > > The logs should say. Did you check the tomcat logs too? > > Thanks, > > Dejan > > > Thanks, > > > > Regards, > > > > Brett > > > > > > -----Original Message----- > > From: Dejan Muhamedagic [mailto:[email protected]] > > Sent: Fri 09/07/2010 11:53 > > To: General Linux-HA mailing list > > Subject: Re: [Linux-HA] Tomcat Resource Agent always leaves dead process on > > stop or restart > > > > Hi, > > > > On Fri, Jul 09, 2010 at 11:41:37AM +0100, Brett Delle Grazie wrote: > > > Hi Dejan, > > > > > > Thanks for your response. > > > > > > You are correct the backgrounded process used to start tomcat by the > > > resource agent isn't exiting the way it should - the question is why? > > > > > > Ignore the incorrect date on the example - I killed the wrong leftover > > > process before setting up the example. > > > > > > restarting tomcat is performed by: > > > > > > crm resource restart cl_tomcat_tc1 > > > > > > To the best of my knowledge this performs a 'stop' and then a 'start'. > > > > Right. Note that "stop" won't run before the current action on > > the resource is done. > > > > > Where cl_tomcat_tc1 is a clone tomcat resource. > > > > > > Any ideas why the backgrounded process doesn't exit? > > > > The start action is like this: > > > > java ... start & > > while not monitor: > > sleep > > > > If the monitor never succeeds, then lrmd will kill the process > > once the timeout for the start operation expires. At any rate, > > lrmd always makes sure that there's only one operation on the > > resource at the time. > > > > Thanks, > > > > Dejan > > > > > Thanks, > > > > > > Best Regards, > > > > > > Brett > > > > > > > > > > > > -----Original Message----- > > > From: Dejan Muhamedagic [mailto:[email protected]] > > > Sent: Fri 09/07/2010 11:32 > > > To: General Linux-HA mailing list > > > Subject: Re: [Linux-HA] Tomcat Resource Agent always leaves dead process > > > on stop or restart > > > > > > Hi, > > > > > > On Thu, Jul 08, 2010 at 10:35:57AM +0100, Brett Delle Grazie wrote: > > > > > > > > Hi, > > > > > > > > I'm using RHEL5.5 in a Heartbeat/Pacemaker cluster managing Tomcat and > > > > Apache HTTPD on two nodes using the ocf:heartbeat:tomcat resource agent > > > > for Tomcat. > > > > > > > > Specific versions: > > > > resource-agents 1.0.3-1 > > > > heartbeat-libs-3.0.3-1 > > > > heartbeat-3.0.3-1 > > > > pacemaker-1.0.8-1.0hg20100317.8debc1902e13 > > > > Tomcat 6.0.26 (downloaded from source). > > > > > > > > I have modified the Tomcat resource agent to be capable of > > > > controlling multiple Tomcat instances by exporting > > > > CATALINA_BASE as well as CATALINA_HOME - these are the only > > > > > > > > > > changes I've made to the resource agent (this is why the agent > > > > path is 'intact' instead of 'heartbeat' in the process list > > > > below) - diff attached. > > > > > > > > When manually issuing a restart of the clone resource on tomcat > > > > I'm left with a dead 'start' process: > > > > > > Doesn't look dead to me, just that it didn't exit. > > > > > > > (before restart): > > > > [r...@fmp-dun-tapp1 ~]# ps -efH | grep [t]omcat > > > > root 22754 21037 0 10:09 pts/0 00:00:00 grep tomcat > > > > root 5058 1 0 Jul07 ? 00:00:00 /bin/sh > > > > /usr/lib/ocf/resource.d//intact/tomcat start > > > > tomcat 5101 1 0 Jul07 ? 00:00:19 > > > > /usr/lib/jvm/java/bin/java > > > > -Djava.util.logging.config.file=/home/tomcat/tc-1/conf/logging.properties > > > > -Dname=tomcat -Djava.awt.headless=true -Djava.library.path=/usr/lib64 > > > > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > > > > -Xmx1024M -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath > > > > /opt/tomcat/bin/bootstrap.jar -Dcatalina.base=/home/tomcat/tc-1 > > > > -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/home/tomcat/tc-1/temp > > > > org.apache.catalina.startup.Bootstrap start > > > > > > > > (after restart): > > > > [r...@fmp-dun-tapp1 ~]# ps -efH | grep [t]omcat > > > > root 5058 1 0 Jul07 ? 00:00:00 /bin/sh > > > > /usr/lib/ocf/resource.d//intact/tomcat start > > > > > > This looks like an old process, judging by the date. Perhaps you > > > killed (using -9) some processes so this one remained hanging? > > > Otherwise, this is not possible, i.e. only one operation on a > > > resource is run. > > > > > > > root 2271 1 0 10:26 ? 00:00:00 /bin/sh > > > > /usr/lib/ocf/resource.d//intact/tomcat start > > > > tomcat 2307 1 21 10:26 ? 00:00:02 > > > > /usr/lib/jvm/java/bin/java > > > > -Djava.util.logging.config.file=/home/tomcat/tc-1/conf/logging.properties > > > > -Dname=tomcat -Djava.awt.headless=true -Djava.library.path=/usr/lib64 > > > > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > > > > -Xmx1024M -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath > > > > /opt/tomcat/bin/bootstrap.jar -Dcatalina.base=/home/tomcat/tc-1 > > > > -Dcatalina.home=/opt/tomcat -Djava.io.tmpdir=/home/tomcat/tc-1/temp > > > > org.apache.catalina.startup.Bootstrap start > > > > > > > > Note the two 'tomcat start' processes above. > > > > Each restart produces successively more copies of the 'tocmat start' > > > > process. > > > > > > What is a "restart"? How does it happen? > > > > > > > Does anyone know why this would occur? I thought the call to > > > > 'catalina.sh start' which is backgrounded in tomcat_start > > > > function in resource should exit after starting Tomcat - but > > > > apparently it doesn't. > > > > > > That's strange. The '&' at the end of the line certainly makes to > > > run in background. Otherwise, the start action goes into infinite > > > loop waiting for the monitor of the resource to succeed. If it > > > never does, then lrmd will timeout and kill the process. > > > > > > Thanks, > > > > > > Dejan > > > > > > > > > > Any help / trouble-shooting tips appreciated. > > > > > > > > Thanks, > > > > > > > > Best Regards, > > > > > > > > Brett > > > > > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > > This email has been scanned by the MessageLabs Email Security System. > > > > For more information please visit http://www.messagelabs.com/email > > > > ______________________________________________________________________ > > > > > > > --- tomcat 2010-07-08 09:34:13.000000000 +0100 > > > > +++ tomcat.intact 2010-07-08 10:24:51.000000000 +0100 > > > > @@ -29,7 +29,9 @@ > > > > # OCF_RESKEY_tomcat_user - A user name to start a resource. Default > > > > is root > > > > # OCF_RESKEY_statusurl - URL for state confirmation. Default is > > > > http://127.0.0.1:8080 > > > > # OCF_RESKEY_java_home - Home directory of the Java. Default is None > > > > +# OCF_RESKEY_java_opts - Options to parse to Java. Always adds > > > > -Dname=OCF_RESKEY_tomcat_name > > > > # OCF_RESKEY_catalina_home - Home directory of Tomcat. Default is > > > > None > > > > +# OCF_RESKEY_catalina_base - Base directory of Tomcat. Default is > > > > None > > > > # OCF_RESKEY_catalina_pid - A PID file name of Tomcat. Default is > > > > OCF_RESKEY_catalina_home/logs/catalina.pid > > > > # OCF_RESKEY_tomcat_start_opts - Start options of the tomcat. > > > > Default is None. > > > > # OCF_RESKEY_catalina_opts - CATALINA_OPTS environment variable. > > > > Default is None. > > > > @@ -147,11 +149,12 @@ > > > > >> "$TOMCAT_CONSOLE" 2>&1 & > > > > else > > > > su - -s /bin/sh "$RESOURCE_TOMCAT_USER" \ > > > > - -c "export JAVA_HOME=${OCF_RESKEY_java_home};\n > > > > - export JAVA_OPTS=-Dname=${TOMCAT_NAME};\n > > > > - export > > > > CATALINA_HOME=${OCF_RESKEY_catalina_home};\n > > > > - export > > > > CATALINA_PID=${OCF_RESKEY_catalina_pid};\n > > > > - export > > > > CATALINA_OPTS=\"${OCF_RESKEY_catalina_opts}\";\n > > > > + -c "export JAVA_HOME=${OCF_RESKEY_java_home};\ > > > > + export JAVA_OPTS=\"-Dname=${TOMCAT_NAME} > > > > ${OCF_RESKEY_java_opts}\";\ > > > > + export > > > > CATALINA_HOME=${OCF_RESKEY_catalina_home};\ > > > > + export > > > > CATALINA_BASE=${OCF_RESKEY_catalina_base};\ > > > > + export > > > > CATALINA_PID=${OCF_RESKEY_catalina_pid};\ > > > > + export > > > > CATALINA_OPTS=\"${OCF_RESKEY_catalina_opts}\";\ > > > > $CATALINA_HOME/bin/catalina.sh start > > > > ${OCF_RESKEY_tomcat_start_opts}" \ > > > > >> "$TOMCAT_CONSOLE" 2>&1 & > > > > fi > > > > @@ -182,10 +185,11 @@ > > > > eval $tomcat_stop_cmd >> "$TOMCAT_CONSOLE" 2>&1 > > > > else > > > > su - -s /bin/sh "$RESOURCE_TOMCAT_USER" \ > > > > - -c "export JAVA_HOME=${OCF_RESKEY_java_home};\n > > > > - export JAVA_OPTS=-Dname=${TOMCAT_NAME};\n > > > > - export > > > > CATALINA_HOME=${OCF_RESKEY_catalina_home};\n > > > > - export > > > > CATALINA_PID=${OCF_RESKEY_catalina_pid};\n > > > > + -c "export JAVA_HOME=${OCF_RESKEY_java_home};\ > > > > + export JAVA_OPTS=\"-Dname=${TOMCAT_NAME} > > > > ${OCF_RESKEY_java_opts}\";\ > > > > + export > > > > CATALINA_HOME=${OCF_RESKEY_catalina_home};\ > > > > + export > > > > CATALINA_BASE=${OCF_RESKEY_catalina_base};\ > > > > + export > > > > CATALINA_PID=${OCF_RESKEY_catalina_pid};\ > > > > $CATALINA_HOME/bin/catalina.sh stop" \ > > > > >> "$TOMCAT_CONSOLE" 2>&1 > > > > fi > > > > @@ -316,6 +320,14 @@ > > > > <content type="string" default="" /> > > > > </parameter> > > > > > > > > +<parameter name="java_opts" unique="0"> > > > > +<longdesc lang="en"> > > > > +Java options > > > > +</longdesc> > > > > +<shortdesc>Java options</shortdesc> > > > > +<content type="string" default="" /> > > > > +</parameter> > > > > + > > > > <parameter name="catalina_home" unique="1" required="1"> > > > > <longdesc lang="en"> > > > > Home directory of Tomcat > > > > @@ -324,6 +336,14 @@ > > > > <content type="string" default="" /> > > > > </parameter> > > > > > > > > +<parameter name="catalina_base" unique="1"> > > > > +<longdesc lang="en"> > > > > +Instance directory of Tomcat > > > > +</longdesc> > > > > +<shortdesc>Instance directory of Tomcat</shortdesc> > > > > +<content type="string" default="" /> > > > > +</parameter> > > > > + > > > > <parameter name="catalina_pid" unique="1"> > > > > <longdesc lang="en"> > > > > A PID file name of Tomcat > > > > @@ -397,9 +417,10 @@ > > > > RESOURCE_STATUSURL="${OCF_RESKEY_statusurl-http://127.0.0.1:8080}" > > > > > > > > JAVA_HOME="${OCF_RESKEY_java_home}" > > > > -JAVA_OPTS="-Dname=$TOMCAT_NAME" > > > > +JAVA_OPTS="-Dname=$TOMCAT_NAME ${OCF_RESKEY_java_opts}" > > > > SEARCH_STR="\\""${JAVA_OPTS}" > > > > CATALINA_HOME="${OCF_RESKEY_catalina_home}" > > > > +CATALINA_BASE="${OCF_RESKEY_catalina_base}" > > > > > > > > CATALINA_PID="${OCF_RESKEY_catalina_pid-$CATALINA_HOME/logs/catalina.pid}" > > > > > > > > TOMCAT_START_OPTS="${OCF_RESKEY_tomcat_start_opts}" > > > > @@ -407,7 +428,7 @@ > > > > CATALINA_ROTATE_LOG="${OCF_RESKEY_catalina_rotate_log-NO}" > > > > CATALINA_ROTATETIME="${OCF_RESKEY_catalina_rotatetime-86400}" > > > > > > > > -export JAVA_HOME JAVA_OPTS CATALINA_HOME CATALINA_PID CATALINA_OPTS > > > > +export JAVA_HOME JAVA_OPTS CATALINA_HOME CATALINA_BASE CATALINA_PID > > > > CATALINA_OPTS > > > > > > > > JAVA=${JAVA_HOME}/bin/java > > > > > > > > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > [email protected] > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > This email has been scanned by the MessageLabs Email Security System. > > > For more information please visit http://www.messagelabs.com/email > > > ______________________________________________________________________ > > > > > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > > > ______________________________________________________________________ > > This email has been scanned by the MessageLabs Email Security System. > > For more information please visit http://www.messagelabs.com/email > > ______________________________________________________________________ > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > ______________________________________________________________________ > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
