Hi List, We are running heartbeat 2.1.3-3 on Centos 5.3 using old-style heartbeat 1.x configs. We are running mysql 5.0.77 ontop of DRBD. What we have seen is that when mysqld does a crash recovery heartbeat thinks the service is failing to start and ends up switching to the other node. The preferred behaviour is that heartbeat will realise mysql has started and leave it to finish its crash recovery.
I found this article which provides an alternate init.d script: http://dev.mysql.com/doc/refman/5.0/en/ha-heartbeat-drbd.html At the moment we have a symlink from the real init.d script provided by CentOS mysql-server package. This must be a common problem so I was wondering whether there were any articles about this or best practice advice. Would the init.d script in the mysql docs fix this issue (from my cursory glance I think the answer is no)? Thanks Joel Here are the heartbeat logs: """ ResourceManager[2094]: 2009/10/24_18:03:31 info: Running /etc/ha.d/resource.d/ Filesystem /dev/drbd0 /mnt/data ext3 noatime start Filesystem[2804]: 2009/10/24_18:03:31 INFO: Running start for /dev/drbd0 on /mnt/data Filesystem[2793]: 2009/10/24_18:03:31 INFO: Success ResourceManager[2094]: 2009/10/24_18:03:31 info: Running /etc/ha.d/resource.d/mysqld start heartbeat[1693]: 2009/10/24_18:03:46 info: Heartbeat restart on node host2-prod.domain.com heartbeat[1693]: 2009/10/24_18:03:46 info: Link host2-prod.domain.com:eth1 up. heartbeat[1693]: 2009/10/24_18:03:46 info: Status update for node host2-prod.domain.com: status init heartbeat[1693]: 2009/10/24_18:03:46 info: Status update for node host2-prod.domain.com: status up heartbeat[1693]: 2009/10/24_18:03:46 info: Status update for node host2-prod.domain.com: status active heartbeat[1693]: 2009/10/24_18:03:47 info: remote resource transition completed. ResourceManager[2094]: 2009/10/24_18:04:02 ERROR: Return code 1 from /etc/ha.d/resource.d/mysqld ResourceManager[2094]: 2009/10/24_18:04:02 CRIT: Giving up resources due to failure of mysqld ResourceManager[2094]: 2009/10/24_18:04:02 info: Releasing resource group: host1-prod.domain.com IPaddr2::208.85.150.73/32/eth0 IPaddr2:: 192.168.128.2/32/eth1 IPaddr2:: 10.12.128.2/32/eth2 drbddisk::host-data Filesystem::/dev/drbd0::/mnt/data::ext3::noatime mysqld httpd collectd monit ResourceManager[2094]: 2009/10/24_18:04:02 info: Running /etc/init.d/monit stop ResourceManager[2094]: 2009/10/24_18:04:02 info: Running /etc/init.d/collectd stop ResourceManager[2094]: 2009/10/24_18:04:07 info: Running /etc/ha.d/resource.d/httpd stop ResourceManager[2094]: 2009/10/24_18:04:07 info: Running /etc/ha.d/resource.d/mysqld stop ResourceManager[2094]: 2009/10/24_18:04:07 ERROR: Return code 1 from /etc/ha.d/resource.d/mysqld ResourceManager[2094]: 2009/10/24_18:04:08 info: Retrying failed stop operation [mysqld] ResourceManager[2094]: 2009/10/24_18:04:08 info: Running /etc/ha.d/resource.d/mysqld stop ResourceManager[2094]: 2009/10/24_18:04:08 ERROR: Return code 1 from /etc/ha.d/resource.d/mysqld ResourceManager[2094]: 2009/10/24_18:04:09 info: Retrying failed stop operation [mysqld] ResourceManager[2094]: 2009/10/24_18:04:09 info: Running /etc/ha.d/resource.d/mysqld stop ResourceManager[2094]: 2009/10/24_18:04:09 ERROR: Return code 1 from /etc/ha.d/resource.d/mysqld ResourceManager[2094]: 2009/10/24_18:04:10 info: Retrying failed stop operation [mysqld] """ _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
