I think I must be missing something somewhere. I have configured an 
Apache and VIP failover EXACTLY as per the instructions on this page:

http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP_.2B_One_service


Specifically I did this:


  Failover IP + One service

Here we assume that:

    * our service that we migrate is Apache
    * we monitor the IP every 10 seconds
    * we monitor the Service(apache) every 15 seconds

Here we create a resource which will use IPaddr (OCF script, provided by 
Heartbeat).br / Tell this resource that it has one paramter (ip) which 
has one value (85.9.12.3). br / Tell this resource that it has one 
opratation (monitor) which has one parameter (interval) with one value (10s)

crm(test-conf)configure# primitive failover-ip ocf:heartbeat:IPaddr params 
ip=85.9.12.3 op monitor interval=10s

Here we create another resource which will use apache (LSB script, 
default location /etc/init.d/apache).br / Tell this resource that it has 
one opratation (monitor) which has one parameter (interval) with one 
value (15s)

crm(test-conf)configure# primitive failover-apache lsb::apache op monitor 
interval=15s

Again, I did it exactly as shown. I started up the cluster and I am able 
to fail it back and forth with service heartbeat stop. Everything works 
perfectly. However, as a test I stopped httpd on the primary node, 
removed /etc/init.d/httpd so that it would not restart the service. In 
my very rookie-type  mind, I expect a failover but none occurs. I get 
this in the ha-log:
Apr 07 11:45:21 APAUAT1A.intranet.mydomain.com lrmd: [3252]: ERROR: 
(raexeclsb.c:execra:267) execv failed for /etc/init.d/httpd: No such 
file or directory
Apr 07 11:45:21 APAUAT1A.intranet.mydomain.com crmd: [2350]: info: 
process_lrm_event: LRM operation httpd_2_monitor_120000 (call=7, rc=5, 
cib-update=14, confirmed=false) not installed
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_ha_callback: Update relayed from apauat1b.intranet.mydomain.com
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_local_callback: Expanded fail-count-httpd_2=value++ to 1
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_trigger_update: Sending flush op to all hosts for: 
fail-count-httpd_2 (1)
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [2347]: info: 
cancel_op: operation monitor[7] on lsb::httpd::httpd_2 for client 2350, 
its parameters: CRM_meta_interval=[120000] CRM_meta_timeout=[60000] 
crm_feature_set=[3.0.1] CRM_meta_name=[monitor]  cancelled
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com crmd: [2350]: info: 
do_lrm_rsc_op: Performing 
key=2:13:0:ed667959-46a8-4f54-b10e-1c642c0f2fff op=httpd_2_stop_0 )
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_perform_update: Sent update 16: fail-count-httpd_2=1
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_ha_callback: Update relayed from apauat1b.intranet.mydomain.com
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [2347]: info: 
rsc:httpd_2:8: stop
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com crmd: [2350]: info: 
process_lrm_event: LRM operation httpd_2_monitor_120000 (call=7, 
status=1, cib-update=0, confirmed=true) Cancelled
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_trigger_update: Sending flush op to all hosts for: 
last-failure-httpd_2 (1270655122)
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_perform_update: Sent update 19: last-failure-httpd_2=1270655122
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [3253]: WARN: For 
LSB init script, no additional parameters are needed.
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [3253]: ERROR: 
(raexeclsb.c:execra:267) execv failed for /etc/init.d/httpd: No such 
file or directory
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com lrmd: [2347]: WARN: 
Managed httpd_2:stop process 3253 exited with return code 5.
Apr 07 11:45:22 APAUAT1A.intranet.mydomain.com crmd: [2350]: info: 
process_lrm_event: LRM operation httpd_2_stop_0 (call=8, rc=5, 
cib-update=15, confirmed=true) not installed
Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_ha_callback: Update relayed from apauat1b.intranet.mydomain.com
Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_trigger_update: Sending flush op to all hosts for: 
fail-count-httpd_2 (INFINITY)
Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_perform_update: Sent update 21: fail-count-httpd_2=INFINITY
Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_ha_callback: Update relayed from apauat1b.intranet.mydomain.com
Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_trigger_update: Sending flush op to all hosts for: 
last-failure-httpd_2 (1270655123)
Apr 07 11:45:23 APAUAT1A.intranet.mydomain.com attrd: [2349]: info: 
attrd_perform_update: Sent update 23: last-failure-httpd_2=1270655123
Apr 07 11:53:09 APAUAT1A.intranet.mydomain.com cib: [2346]: info: 
cib_stats: Processed 60 operations (666.00us average, 0% utilization) in 
the last 10min

Obviously I'm missing something here, but what? I know th elog file says 
it can't find httpd but I expect LinuxHA to see that it cannot be 
restarted and do a failover. How can I make this happen?
                                                        

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to