Hello, We've been using linux-ha heartbeat for over a year now and it generally delivers well.
Today I found that in the last fail-over it just didn't start one of the configured resources. Here are our configuration files: /etc/ha.d/haresources: host1 \ drbddisk::portal-data \ Filesystem::/dev/drbd0::/mnt/data::ext3::noatime \ mysqld \ httpd \ threatmetrix_api_confirm_queue.pl /etc/ha.d/ha.cf: debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 ucast eth1 10.0.0.105 ucast eth1 10.0.0.106 auto_failback off node host1 node host2 ping 74.54.241.65 apiauth ipfail gid=haclient uid=hacluster And here is the log from the fail-over: harc[14438]: 2008/12/12_16:09:46 info: Running /etc/ha.d/rc.d/hb_takeover hb_takeover heartbeat[13898]: 2008/12/12_16:09:46 info: portal1-prod-ks.threatmetrix.com wants to go standby [all] heartbeat[13898]: 2008/12/12_16:10:01 info: standby: acquire [all] resources from portal1-prod-ks.threatmetrix.com heartbeat[14454]: 2008/12/12_16:10:01 info: acquire all HA resources (standby). ResourceManager[14467]: 2008/12/12_16:10:01 info: Acquiring resource group: portal1-prod-ks.threatmetrix.com drbddisk::portal-data Filesystem::/dev/drbd0::/mnt/data::ext3::noatime mysqld httpd threatmetrix_api_confirm_queue.pl ResourceManager[14467]: 2008/12/12_16:10:01 info: Running /etc/ha.d/resource.d/drbddisk portal-data start Filesystem[14535]: 2008/12/12_16:10:02 INFO: Resource is stopped ResourceManager[14467]: 2008/12/12_16:10:02 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/data ext3 noatime start Filesystem[14616]: 2008/12/12_16:10:02 INFO: Running start for /dev/drbd0 on /mnt/data Filesystem[14605]: 2008/12/12_16:10:02 INFO: Success ResourceManager[14467]: 2008/12/12_16:10:02 info: Running /etc/ha.d/resource.d/mysqld start ResourceManager[14467]: 2008/12/12_16:10:03 info: Running /etc/ha.d/resource.d/httpd start heartbeat[14454]: 2008/12/12_16:10:38 info: all HA resource acquisition completed (standby). heartbeat[13898]: 2008/12/12_16:10:38 info: Standby resource acquisition done [all]. heartbeat[13898]: 2008/12/12_16:10:38 info: remote resource transition completed. The "threatmetrix_api_confirm_queue.pl" script is linked from /etc/ha.d/resources.d/, it accepts the usualt "start/stop/status" parameters and worked before, through fail-over. It appears from the log above that heartbeat just ignores this resource, even though it read the configuration file which mentions it. All other resources are started correctly. What's the problem? Thanks, --Amos _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
