Hi All,
I have several pressing questions about heartbeat - I've perused the faq
and wiki but not come up with the answers:
- is a two node cluster with stonith safe from split-brain,
or do we need to configure a ping node or tiebreaker or
the like?
- when we have a misconfiguration, we end up with a stonith
"deathmatch". Both machines are either being killed or
commiting suicide. Disabling stonith requires the CIB to
be up, but as soon as it comes up, the machines get killed.
Is there any way to disable stonith BEFORE bringing hearbeat
up?
- any tips for preventing "deathmatch" situations? Surely after
a reboot or two, it's safe to assume that more rebooting isn't
going to improve the situation? Perhaps we have bad timeouts
set or something?
- heartbeat seems very keen to kill nodes. What's the rule for
when nodes get killed? I would expect that timeout and failure
to stop would justify stonithing, but a start/status failure?
- the log is very verbose, but doesn't seem to include useful
messages such as "running 'start' for 'X' on node 'Y'" or
"killing node 'X' due to failure of 'status' op" etc. Am
I missing the significant messages? Any way of filtering the
basic set of "big event" messages from the rest?
Hope someone can provide some guidance here. Overall, we're finding
heartbeat works very well, but the above issues are making life somewhat
difficult.
cheers
--
-------------------------------------------------------------------
Daniel Moore [EMAIL PROTECTED]
SGI Australian Software Group
-------------------------------------------------------------------
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems