I think I must be missing something somewhere. I have configured an Apache and VIP failover EXACTLY as per the instructions on this page:
http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP_.2B_One_service Specifically I did this: Failover IP + One service Here we assume that: * our service that we migrate is Apache * we monitor the IP every 10 seconds * we monitor the Service(apache) every 15 seconds Here we create a resource which will use IPaddr (OCF script, provided by Heartbeat).br / Tell this resource that it has one paramter (ip) which has one value (85.9.12.3). br / Tell this resource that it has one opratation (monitor) which has one parameter (interval) with one value (10s) crm(test-conf)configure# primitive failover-ip ocf:heartbeat:IPaddr params ip=85.9.12.3 op monitor interval=10s Here we create another resource which will use apache (LSB script, default location /etc/init.d/apache).br / Tell this resource that it has one opratation (monitor) which has one parameter (interval) with one value (15s) crm(test-conf)configure# primitive failover-apache lsb::apache op monitor interval=15s Again, I did it exactly as shown. I started up the cluster and I am able to fail it back and forth with service heartbeat stop. Everything works perfectly. However, as a test I stopped httpd on the primary node, removed /etc/init.d/httpd so that it would not restart the service. In my very rookie-type mind, I expect a failover but none occurs. I get this in the ha-log: Apr 07 11:45:21 APAUAT1A.intranet.mydomain.com lrmd: [3252]: ERROR: (raexeclsb.c:execra:267) execv failed for /etc/init.d/httpd: No such file or directory Apr 07 11:45:21 APAUAT1A.intranet.mydomain.com crmd: [2350]: info: process_lrm_event: LRM operation httpd_2_monitor_120000 (call=7, rc=5, cib-update=14, confirmed=false) not installed Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_ha_callback: Update relayed from apauat1b.intranet.mydomain.com Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_local_callback: Expanded fail-count-httpd_2=value++ to 1 Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-httpd_2 (1) Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [2347]: info: cancel_op: operation monitor[7] on lsb::httpd::httpd_2 for client 2350, its parameters: CRM_meta_interval=[120000] CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] cancelled Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com crmd: [2350]: info: do_lrm_rsc_op: Performing key=2:13:0:ed667959-46a8-4f54-b10e-1c642c0f2fff op=httpd_2_stop_0 ) Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_perform_update: Sent update 16: fail-count-httpd_2=1 Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_ha_callback: Update relayed from apauat1b.intranet.mydomain.com Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [2347]: info: rsc:httpd_2:8: stop Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com crmd: [2350]: info: process_lrm_event: LRM operation httpd_2_monitor_120000 (call=7, status=1, cib-update=0, confirmed=true) Cancelled Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-httpd_2 (1270655122) Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_perform_update: Sent update 19: last-failure-httpd_2=1270655122 Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [3253]: WARN: For LSB init script, no additional parameters are needed. Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [3253]: ERROR: (raexeclsb.c:execra:267) execv failed for /etc/init.d/httpd: No such file or directory Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [2347]: WARN: Managed httpd_2:stop process 3253 exited with return code 5. Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com crmd: [2350]: info: process_lrm_event: LRM operation httpd_2_stop_0 (call=8, rc=5, cib-update=15, confirmed=true) not installed Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_ha_callback: Update relayed from apauat1b.intranet.mydomain.com Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-httpd_2 (INFINITY) Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_perform_update: Sent update 21: fail-count-httpd_2=INFINITY Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_ha_callback: Update relayed from apauat1b.intranet.mydomain.com Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-httpd_2 (1270655123) Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: attrd_perform_update: Sent update 23: last-failure-httpd_2=1270655123 Apr 07 11:53:09 APAUAT1A.intranet.mydomain.com cib: [2346]: info: cib_stats: Processed 60 operations (666.00us average, 0% utilization) in the last 10min Obviously I'm missing something here, but what? I know th elog file says it can't find httpd but I expect LinuxHA to see that it cannot be restarted and do a failover. How can I make this happen? _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
