Hello,

We've been using linux-ha heartbeat for over a year now and it
generally delivers well.

Today I found that in the last fail-over it just didn't start one of
the configured resources.

Here are our configuration files:

/etc/ha.d/haresources:
host1 \
 drbddisk::portal-data \
 Filesystem::/dev/drbd0::/mnt/data::ext3::noatime \
 mysqld \
 httpd \
 threatmetrix_api_confirm_queue.pl

/etc/ha.d/ha.cf:
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
ucast   eth1 10.0.0.105
ucast   eth1 10.0.0.106
auto_failback off
node    host1
node    host2
ping    74.54.241.65
apiauth ipfail gid=haclient uid=hacluster

And here is the log from the fail-over:

harc[14438]:    2008/12/12_16:09:46 info: Running
/etc/ha.d/rc.d/hb_takeover hb_takeover
heartbeat[13898]: 2008/12/12_16:09:46 info:
portal1-prod-ks.threatmetrix.com wants to go standby [all]
heartbeat[13898]: 2008/12/12_16:10:01 info: standby: acquire [all]
resources from portal1-prod-ks.threatmetrix.com
heartbeat[14454]: 2008/12/12_16:10:01 info: acquire all HA resources (standby).
ResourceManager[14467]: 2008/12/12_16:10:01 info: Acquiring resource
group: portal1-prod-ks.threatmetrix.com drbddisk::portal-data
Filesystem::/dev/drbd0::/mnt/data::ext3::noatime mysqld httpd
threatmetrix_api_confirm_queue.pl
ResourceManager[14467]: 2008/12/12_16:10:01 info: Running
/etc/ha.d/resource.d/drbddisk portal-data start
Filesystem[14535]:      2008/12/12_16:10:02 INFO:  Resource is stopped
ResourceManager[14467]: 2008/12/12_16:10:02 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/data ext3 noatime
start
Filesystem[14616]:      2008/12/12_16:10:02 INFO: Running start for
/dev/drbd0 on /mnt/data
Filesystem[14605]:      2008/12/12_16:10:02 INFO:  Success
ResourceManager[14467]: 2008/12/12_16:10:02 info: Running
/etc/ha.d/resource.d/mysqld  start
ResourceManager[14467]: 2008/12/12_16:10:03 info: Running
/etc/ha.d/resource.d/httpd  start
heartbeat[14454]: 2008/12/12_16:10:38 info: all HA resource
acquisition completed (standby).
heartbeat[13898]: 2008/12/12_16:10:38 info: Standby resource
acquisition done [all].
heartbeat[13898]: 2008/12/12_16:10:38 info: remote resource transition
completed.

The "threatmetrix_api_confirm_queue.pl" script is linked from
/etc/ha.d/resources.d/, it accepts the usualt "start/stop/status"
parameters and worked before, through fail-over.

It appears from the log above that heartbeat just ignores this
resource, even though it read the configuration file which mentions
it.
All other resources are started correctly.

What's the problem?

Thanks,

--Amos
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to