I'm new to Pacemaker / OpenAIS / Corosync / Linux-HA, and have been going through the "Clusters from Scratch v2" tutorial at http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf.
I could not get the CRM to start apache, but could start it directly via "/etc/init.d/httpd start". My symptoms were identical to that described in the Pacemaker mailing list post: http://www.mail-archive.com/[email protected]/msg02479.html /var/log/httpd/error.log seemed to suggest that apache was starting, then catching a SIGTERM 1 second later. I tried the suggested solutions (ensuring that apache's PidFile and ExtendedStatus were enabled), but this didn't work. I debugged the RA directly, and found what I believe to be a race condition in the monitor_apache() function. The first thing it does is to call silent_status(), which basically checks to see if there is a PID file for apache, and that there is a running process with that PID. If that is true, it then calls monitor_apache_basic(), which, with my configuration, wget's http://localhost:80. If the wget fails, return code 1 is passed upwards. The race condition is that when apache is started, it is possible for it to have written it's PID file, but not yet completed its initialization to the point where the wget would succeed. I was able to work around this problem by placing a simple "sleep 5" after starting httpd and the first call to monitor_apache(). I'm running Fedora 13 as a VirtualBox VM guest on a Win7 host, Pacemaker 1.1.3, Corosync 1.2.8, and apache RA from resource-agents 3.0.16. I would appreciate it if someone more familiar with this RA could double-check my theory. Cheers, Brett _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
