Raoul Bhatia [IPAX] escribió:
Adrian Chapela wrote:
<master_slave id="MySQL_Server">
[snip]
<operations>
<op id="mysqld-child-monitor" name="monitor"
interval="20s" timeout="19s" prereq="nothing"/>
<op id="mysqld-child-start" name="start" prereq="nothing"/>
</operations>
</primitive>
</master_slave>
I think that this line: <op id="mysqld-child-monitor"
name="monitor" interval="20s" timeout="19s" prereq="nothing"/> is the
line to config monitoring operations and the time to do that. In this
line I think interval is 20 seconds, but I am testing and I manually
make an error in the Master MySQL server to test failover. I saw that
monitoring operation isn't being executed and the error isn't
detected by Heartbeat.
If I run the script manually the error is detected but Heartbeat is
not running the script in monitor mode and it don't know the problem.
This is the crm_mon output:
[snip]
Yes, I already did this and now I am testing more options. Now, a Slave
server is making failover well but I have some problems with my mysql
script ( http://code.adrianchapela.net/heartbeat/mysql_slave_master ).
One of them is the stop operation. After a failure, my mysql resource is
stopped but MySQL monitor is always informing that the server is down
and failed. Heartbeat knows the server is failed. When I am stopping
Heartbeart server, this can't stop well. It says this:
crmd[8531]: 2008/02/26_11:09:10 ERROR: verify_stopped: Resource
mysqld-child:0 was active at shutdown. You may ignore this error if it
is unmanaged.
crmd[8531]: 2008/02/26_11:09:10 info: process_client_disconnect:
Received HUP from tengine:[-1]
crmd[8531]: 2008/02/26_11:09:10 ERROR: verify_stopped: Resource
mysqld-child:0 was active at shutdown. You may ignore this error if it
is unmanaged.
And this:
tengine[8566]: 2008/02/26_11:09:09 info: te_connect_stonith: Attempting
connection to fencing daemon...
crmd[8531]: 2008/02/26_11:09:09 info: stop_subsystem: Sent -TERM to
tengine: [8566]
tengine[8566]: 2008/02/26_11:09:09 ERROR: stonithd_signon: Can't
initiate connection to stonithd
crmd[8531]: 2008/02/26_11:09:09 info: do_shutdown: Waiting for
subsystems to exit
tengine[8566]: 2008/02/26_11:09:09 notice: Not currently connected.
crmd[8531]: 2008/02/26_11:09:09 info: do_shutdown: All subsystems
stopped, conti
I am searching information about this errors and How can I force the
stop operation ? Stonith daemon should shutdown the server automatically ?
please refer to [1] and add more monitoring actions for all applicable
roles.
cheers,
raoul
[1]
http://www.linux-ha.org/ClusterInformationBase/Actions#head-951a50aae161c116d73c95aa0659873ee7a2973b
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems