On Wed, Dec 17, 2008 at 04:21:15PM +1100, Amos Shapira wrote: > Hello, > > We've been using linux-ha heartbeat for over a year now and it > generally delivers well. > > Today I found that in the last fail-over it just didn't start one of > the configured resources. > > Here are our configuration files: > > /etc/ha.d/haresources: > host1 \ > drbddisk::portal-data \ > Filesystem::/dev/drbd0::/mnt/data::ext3::noatime \ > mysqld \ > httpd \ > threatmetrix_api_confirm_queue.pl > > /etc/ha.d/ha.cf: > debugfile /var/log/ha-debug > logfile /var/log/ha-log > logfacility local0 > keepalive 2 > deadtime 30 > warntime 10 > initdead 120 > udpport 694 > ucast eth1 10.0.0.105 > ucast eth1 10.0.0.106 > auto_failback off > node host1 > node host2 > ping 74.54.241.65 > apiauth ipfail gid=haclient uid=hacluster > > And here is the log from the fail-over: > > harc[14438]: 2008/12/12_16:09:46 info: Running > /etc/ha.d/rc.d/hb_takeover hb_takeover > heartbeat[13898]: 2008/12/12_16:09:46 info: > portal1-prod-ks.threatmetrix.com wants to go standby [all] > heartbeat[13898]: 2008/12/12_16:10:01 info: standby: acquire [all] > resources from portal1-prod-ks.threatmetrix.com > heartbeat[14454]: 2008/12/12_16:10:01 info: acquire all HA resources > (standby). > ResourceManager[14467]: 2008/12/12_16:10:01 info: Acquiring resource > group: portal1-prod-ks.threatmetrix.com drbddisk::portal-data > Filesystem::/dev/drbd0::/mnt/data::ext3::noatime mysqld httpd > threatmetrix_api_confirm_queue.pl > ResourceManager[14467]: 2008/12/12_16:10:01 info: Running > /etc/ha.d/resource.d/drbddisk portal-data start > Filesystem[14535]: 2008/12/12_16:10:02 INFO: Resource is stopped > ResourceManager[14467]: 2008/12/12_16:10:02 info: Running > /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/data ext3 noatime > start > Filesystem[14616]: 2008/12/12_16:10:02 INFO: Running start for > /dev/drbd0 on /mnt/data > Filesystem[14605]: 2008/12/12_16:10:02 INFO: Success > ResourceManager[14467]: 2008/12/12_16:10:02 info: Running > /etc/ha.d/resource.d/mysqld start > ResourceManager[14467]: 2008/12/12_16:10:03 info: Running > /etc/ha.d/resource.d/httpd start > heartbeat[14454]: 2008/12/12_16:10:38 info: all HA resource > acquisition completed (standby). > heartbeat[13898]: 2008/12/12_16:10:38 info: Standby resource > acquisition done [all]. > heartbeat[13898]: 2008/12/12_16:10:38 info: remote resource transition > completed. > > The "threatmetrix_api_confirm_queue.pl" script is linked from > /etc/ha.d/resources.d/, it accepts the usualt "start/stop/status" > parameters and worked before, through fail-over. > > It appears from the log above that heartbeat just ignores this > resource, even though it read the configuration file which mentions > it. > All other resources are started correctly. > > What's the problem?
No idea. The resource manager should at least report an error if there's something wrong with the resource/resource agent. It worked before? Nothing else in the logs? grep threatmetrix_api_confirm_queue.pl /var/log/* ? Thanks, Dejan > Thanks, > > --Amos > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
