Raoul Bhatia [IPAX] escribió:
Adrian Chapela wrote:
      <master_slave id="MySQL_Server">
[snip]
          <operations>
<op id="mysqld-child-monitor" name="monitor" interval="20s" timeout="19s" prereq="nothing"/>
            <op id="mysqld-child-start" name="start" prereq="nothing"/>
          </operations>
        </primitive>
      </master_slave>

I think that this line: <op id="mysqld-child-monitor" name="monitor" interval="20s" timeout="19s" prereq="nothing"/> is the line to config monitoring operations and the time to do that. In this line I think interval is 20 seconds, but I am testing and I manually make an error in the Master MySQL server to test failover. I saw that monitoring operation isn't being executed and the error isn't detected by Heartbeat.

If I run the script manually the error is detected but Heartbeat is not running the script in monitor mode and it don't know the problem. This is the crm_mon output:

[snip]
Yes, I already did this and now I am testing more options. Now, a Slave server is making failover well but I have some problems with my mysql script ( http://code.adrianchapela.net/heartbeat/mysql_slave_master ). One of them is the stop operation. After a failure, my mysql resource is stopped but MySQL monitor is always informing that the server is down and failed. Heartbeat knows the server is failed. When I am stopping Heartbeart server, this can't stop well. It says this:

crmd[8531]: 2008/02/26_11:09:10 ERROR: verify_stopped: Resource mysqld-child:0 was active at shutdown. You may ignore this error if it is unmanaged. crmd[8531]: 2008/02/26_11:09:10 info: process_client_disconnect: Received HUP from tengine:[-1] crmd[8531]: 2008/02/26_11:09:10 ERROR: verify_stopped: Resource mysqld-child:0 was active at shutdown. You may ignore this error if it is unmanaged.

And this:
tengine[8566]: 2008/02/26_11:09:09 info: te_connect_stonith: Attempting connection to fencing daemon... crmd[8531]: 2008/02/26_11:09:09 info: stop_subsystem: Sent -TERM to tengine: [8566] tengine[8566]: 2008/02/26_11:09:09 ERROR: stonithd_signon: Can't initiate connection to stonithd crmd[8531]: 2008/02/26_11:09:09 info: do_shutdown: Waiting for subsystems to exit
tengine[8566]: 2008/02/26_11:09:09 notice: Not currently connected.
crmd[8531]: 2008/02/26_11:09:09 info: do_shutdown: All subsystems stopped, conti

I am searching information about this errors and How can I force the stop operation ? Stonith daemon should shutdown the server automatically ?

please refer to [1] and add more monitoring actions for all applicable
roles.

cheers,
raoul

[1] http://www.linux-ha.org/ClusterInformationBase/Actions#head-951a50aae161c116d73c95aa0659873ee7a2973b

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to