I'm new to Pacemaker / OpenAIS / Corosync / Linux-HA, and have been
going through the "Clusters from Scratch v2" tutorial at
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf.

I could not get the CRM to start apache, but could start it directly via
"/etc/init.d/httpd start".   My symptoms were identical to that
described in the Pacemaker mailing list post:
http://www.mail-archive.com/[email protected]/msg02479.html

/var/log/httpd/error.log seemed to suggest that apache was starting,
then catching a SIGTERM 1 second later.

I tried the suggested solutions (ensuring that apache's PidFile and
ExtendedStatus were enabled), but this didn't work.

I debugged the RA directly, and found what I believe to be a race
condition in the monitor_apache() function.  The first thing it does is
to call silent_status(), which basically checks to see if there is a PID
file for apache, and that there is a running process with that PID.  If
that is true, it then calls monitor_apache_basic(), which, with my
configuration, wget's http://localhost:80.  If the wget fails, return
code 1 is passed upwards.

The race condition is that when apache is started, it is possible for it
to have written it's PID file, but not yet completed its initialization
to the point where the wget would succeed.  I was able to work around
this problem by placing a simple "sleep 5" after starting httpd and the
first call to monitor_apache().

I'm running Fedora 13 as a VirtualBox VM guest on a Win7 host, Pacemaker
1.1.3, Corosync 1.2.8, and apache RA from resource-agents 3.0.16.

I would appreciate it if someone more familiar with this RA could
double-check my theory.

Cheers,
Brett
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to