On Wed, Dec 17, 2008 at 04:21:15PM +1100, Amos Shapira wrote:
> Hello,
> 
> We've been using linux-ha heartbeat for over a year now and it
> generally delivers well.
> 
> Today I found that in the last fail-over it just didn't start one of
> the configured resources.
> 
> Here are our configuration files:
> 
> /etc/ha.d/haresources:
> host1 \
>  drbddisk::portal-data \
>  Filesystem::/dev/drbd0::/mnt/data::ext3::noatime \
>  mysqld \
>  httpd \
>  threatmetrix_api_confirm_queue.pl
> 
> /etc/ha.d/ha.cf:
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility local0
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 694
> ucast   eth1 10.0.0.105
> ucast   eth1 10.0.0.106
> auto_failback off
> node    host1
> node    host2
> ping    74.54.241.65
> apiauth ipfail gid=haclient uid=hacluster
> 
> And here is the log from the fail-over:
> 
> harc[14438]:    2008/12/12_16:09:46 info: Running
> /etc/ha.d/rc.d/hb_takeover hb_takeover
> heartbeat[13898]: 2008/12/12_16:09:46 info:
> portal1-prod-ks.threatmetrix.com wants to go standby [all]
> heartbeat[13898]: 2008/12/12_16:10:01 info: standby: acquire [all]
> resources from portal1-prod-ks.threatmetrix.com
> heartbeat[14454]: 2008/12/12_16:10:01 info: acquire all HA resources 
> (standby).
> ResourceManager[14467]: 2008/12/12_16:10:01 info: Acquiring resource
> group: portal1-prod-ks.threatmetrix.com drbddisk::portal-data
> Filesystem::/dev/drbd0::/mnt/data::ext3::noatime mysqld httpd
> threatmetrix_api_confirm_queue.pl
> ResourceManager[14467]: 2008/12/12_16:10:01 info: Running
> /etc/ha.d/resource.d/drbddisk portal-data start
> Filesystem[14535]:      2008/12/12_16:10:02 INFO:  Resource is stopped
> ResourceManager[14467]: 2008/12/12_16:10:02 info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/data ext3 noatime
> start
> Filesystem[14616]:      2008/12/12_16:10:02 INFO: Running start for
> /dev/drbd0 on /mnt/data
> Filesystem[14605]:      2008/12/12_16:10:02 INFO:  Success
> ResourceManager[14467]: 2008/12/12_16:10:02 info: Running
> /etc/ha.d/resource.d/mysqld  start
> ResourceManager[14467]: 2008/12/12_16:10:03 info: Running
> /etc/ha.d/resource.d/httpd  start
> heartbeat[14454]: 2008/12/12_16:10:38 info: all HA resource
> acquisition completed (standby).
> heartbeat[13898]: 2008/12/12_16:10:38 info: Standby resource
> acquisition done [all].
> heartbeat[13898]: 2008/12/12_16:10:38 info: remote resource transition
> completed.
> 
> The "threatmetrix_api_confirm_queue.pl" script is linked from
> /etc/ha.d/resources.d/, it accepts the usualt "start/stop/status"
> parameters and worked before, through fail-over.
> 
> It appears from the log above that heartbeat just ignores this
> resource, even though it read the configuration file which mentions
> it.
> All other resources are started correctly.
> 
> What's the problem?

No idea. The resource manager should at least report an error if
there's something wrong with the resource/resource agent.
It worked before?
Nothing else in the logs?
grep threatmetrix_api_confirm_queue.pl /var/log/* ?

Thanks,

Dejan

> Thanks,
> 
> --Amos
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to