On Fri, Oct 15, 2010 at 01:16:21PM -0400, Buckingham, Brett wrote:
> >> The race condition is that when apache is started, it is possible for
> it
> >> to have written it's PID file, but not yet completed its
> initialization
> >> to the point where the wget would succeed. I was able to work around
> >> this problem by placing a simple "sleep 5" after starting httpd and
> the
> >> first call to monitor_apache().
>
> >If that's the case, then the start action should loop on
> >monitor_apache internally until that returns Ok.
> >That way, start will only return once monitoring does actually work.
> >Bonus: you get a start failure already, if monitoring is not configured
> >properly.
>
> >... looking at the code ...
> >Wait. It does that already, since May 2007.
>
> start_apache() only loops monitoring if monitor_apache() returns
> $OCF_NOT_RUNNING (7). monitor_apache() returned 1 ($OCF_ERR_GENERIC)
> due to the control flow described above.
>
> I think what is needed is specific monitoring logic for apache startup
> which allows for the PID file to be there but some period of time before
> an HTTP request is returned. Once apache is running, I agree that the
> monitor_apache() function, which requires the PID file, process matching
> the pid, and a successful wget is OK.
Ok, so the problem is that monitor_apache_basic
(and monitor_apache_extended) return one of $OCF_ERR_CONFIGURED,
$OCF_SUCCESS or $OCF_GENERIC (or even $OCF_ERR_ARGS, if you get grep to
exit with 2 for a bad parameter...),
but the loop breaks for anything != $OCF_NOT_RUNNING.
what about the patchlet below.
diff --git a/heartbeat/apache b/heartbeat/apache
--- a/heartbeat/apache
+++ b/heartbeat/apache
@@ -404,19 +404,11 @@ start_apache() {
return $OCF_SUCCESS
fi
ocf_run $HTTPD $HTTPDOPTS $OPTIONS -f $CONFIGFILE
- ...
+ # loop until we are killed because of start action timeout,
+ # or monitor returns successs, whatever comes first.
+ while ! monitor_apache; do
+ ocf_log info "waiting for apache $CONFIGFILE to come up"
+ sleep 1
done
- ...
possible todo:
* log should probably not be done every loop iteration
* exit early for $OCF_ERR_CONFIGURED
* bonus points:
OCF_ERR_CONFIGURED is only a start failure,
if a monitor action is configured?
* maybe don't even enter the loop, of ocf_run exits != 0?
Second iteration below, just written down, not tested even once.
Please comment.
diff --git a/heartbeat/apache b/heartbeat/apache
--- a/heartbeat/apache
+++ b/heartbeat/apache
@@ -404,24 +404,46 @@ start_apache() {
return $OCF_SUCCESS
fi
ocf_run $HTTPD $HTTPDOPTS $OPTIONS -f $CONFIGFILE
+ # loop until we are killed because of start action timeout,
+ # or monitor returns a final exit code, whatever comes first.
tries=0
- while : # wait until the user set timeout
- do
+ while :; do
monitor_apache
- ec=$?
- if [ $ec -eq $OCF_NOT_RUNNING ]
- then
- tries=`expr $tries + 1`
- ocf_log info "waiting for apache $CONFIGFILE to come up"
- sleep 1
- else
- break
- fi
+ rc=$?
+ case $rc in
+ $OCF_SUCCESS)
+ return $OCF_SUCCESS
+ ;;
+ $OCF_ERR_CONFIGURED)
+ # Is only returned if silent_status was ok already, that means
+ # the apache process is at least (or, was...) running,
+ # but monitor_apache_basic failed.
+ #
+ # Possibly the parameters necessary for monitor_apache_basic
+ # are wrong, but maybe they are just missing.
+ #
+ # If the user has not configured a monitor action, that's not fatal.
+ # If he has, then the next run of that monitor action will pick
+ # it up anyways. So we just return success here.
+ return $OCF_SUCCESS
+ ;;
+ $OCF_NOT_RUNNING|$OCF_ERR_GENERIC)
+ # not ready yet
+ ;;
+ *)
+ # should not happen. But treat as "not ready yet", anyways.
+ ;;
+ esac
+
+ # even though this may look like a bashism, it is not,
+ # but POSIX since at least 1997.
+ : $((tries=tries + 1))
+ if [ $((tries % 20)) = 2 ] ; then
+ ocf_log info "waiting for apache $CONFIGFILE to come up"
+ fi
+ sleep 1
done
- if [ $ec -ne 0 ] && silent_status; then
- stop_apache
- fi
- return $ec
+ # not reached.
}
stop_apache() {
@@ -496,7 +518,7 @@ monitor_apache_extended() {
fixtesturl
is_testconf_sane ||
return $OCF_ERR_CONFIGURED
- $whattorun "$test_url" | grep -Ei "$test_regex" > /dev/null
+ $whattorun "$test_url" | grep -Eie "$test_regex" > /dev/null
}
monitor_apache_basic() {
if [ -z "$STATUSURL" ]; then
@@ -506,7 +528,7 @@ monitor_apache_basic() {
ocf_log err "could not find a http client; make sure that either wget or
curl is available"
return $OCF_ERR_CONFIGURED
fi
- ${ourhttpclient}_func "$STATUSURL" | grep -Ei "$TESTREGEX" > /dev/null
+ ${ourhttpclient}_func "$STATUSURL" | grep -Eie "$TESTREGEX" > /dev/null
}
monitor_apache() {
silent_status
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems