[Linux-HA] heartbeat continually failing over during mysqld crash recovery

Joel Heenan Wed, 04 Nov 2009 10:55:47 -0800

Hi List,

We are running heartbeat 2.1.3-3 on Centos 5.3 using old-style heartbeat 1.x
configs. We are running mysql 5.0.77 ontop of DRBD. What we have seen is
that when mysqld does a crash recovery heartbeat thinks the service is
failing to start and ends up switching to the other node. The preferred
behaviour is that heartbeat will realise mysql has started and leave it to
finish its crash recovery.


I found this article which provides an alternate init.d script:

http://dev.mysql.com/doc/refman/5.0/en/ha-heartbeat-drbd.html

At the moment we have a symlink from the real init.d script provided by
CentOS mysql-server package.

This must be a common problem so I was wondering whether there were any
articles about this or best practice advice. Would the init.d script in the
mysql docs fix this issue (from my cursory glance I think the answer is
no)?

Thanks

Joel

Here are the heartbeat logs:

"""

ResourceManager[2094]:  2009/10/24_18:03:31 info: Running
/etc/ha.d/resource.d/
Filesystem /dev/drbd0 /mnt/data ext3 noatime start
Filesystem[2804]:       2009/10/24_18:03:31 INFO: Running start for
/dev/drbd0 on /mnt/data
Filesystem[2793]:       2009/10/24_18:03:31 INFO:  Success
ResourceManager[2094]:  2009/10/24_18:03:31 info: Running
/etc/ha.d/resource.d/mysqld  start
heartbeat[1693]: 2009/10/24_18:03:46 info: Heartbeat restart on node
host2-prod.domain.com
heartbeat[1693]: 2009/10/24_18:03:46 info: Link host2-prod.domain.com:eth1
up.
heartbeat[1693]: 2009/10/24_18:03:46 info: Status update for node
host2-prod.domain.com: status init
heartbeat[1693]: 2009/10/24_18:03:46 info: Status update for node
host2-prod.domain.com: status up
heartbeat[1693]: 2009/10/24_18:03:46 info: Status update for node
host2-prod.domain.com: status active
heartbeat[1693]: 2009/10/24_18:03:47 info: remote resource transition
completed.
ResourceManager[2094]:  2009/10/24_18:04:02 ERROR: Return code 1 from
/etc/ha.d/resource.d/mysqld
ResourceManager[2094]:  2009/10/24_18:04:02 CRIT: Giving up resources due to
failure of mysqld
ResourceManager[2094]:  2009/10/24_18:04:02 info: Releasing resource group:
host1-prod.domain.com IPaddr2::208.85.150.73/32/eth0 IPaddr2::
192.168.128.2/32/eth1 IPaddr2::
10.12.128.2/32/eth2 drbddisk::host-data
Filesystem::/dev/drbd0::/mnt/data::ext3::noatime mysqld httpd collectd monit
ResourceManager[2094]:  2009/10/24_18:04:02 info: Running /etc/init.d/monit
stop
ResourceManager[2094]:  2009/10/24_18:04:02 info: Running
/etc/init.d/collectd  stop
ResourceManager[2094]:  2009/10/24_18:04:07 info: Running
/etc/ha.d/resource.d/httpd  stop
ResourceManager[2094]:  2009/10/24_18:04:07 info: Running
/etc/ha.d/resource.d/mysqld  stop
ResourceManager[2094]:  2009/10/24_18:04:07 ERROR: Return code 1 from
/etc/ha.d/resource.d/mysqld
ResourceManager[2094]:  2009/10/24_18:04:08 info: Retrying failed stop
operation [mysqld]
ResourceManager[2094]:  2009/10/24_18:04:08 info: Running
/etc/ha.d/resource.d/mysqld  stop
ResourceManager[2094]:  2009/10/24_18:04:08 ERROR: Return code 1 from
/etc/ha.d/resource.d/mysqld
ResourceManager[2094]:  2009/10/24_18:04:09 info: Retrying failed stop
operation [mysqld]
ResourceManager[2094]:  2009/10/24_18:04:09 info: Running
/etc/ha.d/resource.d/mysqld  stop
ResourceManager[2094]:  2009/10/24_18:04:09 ERROR: Return code 1 from
/etc/ha.d/resource.d/mysqld
ResourceManager[2094]:  2009/10/24_18:04:10 info: Retrying failed stop
operation [mysqld]


"""
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] heartbeat continually failing over during mysqld crash recovery

Reply via email to