Hi All,

I have several pressing questions about heartbeat - I've perused the faq and wiki but not come up with the answers:

        - is a two node cluster with stonith safe from split-brain,
          or do we need to configure a ping node or tiebreaker or
          the like?

        - when we have a misconfiguration, we end up with a stonith
          "deathmatch". Both machines are either being killed or
          commiting suicide. Disabling stonith requires the CIB to
          be up, but as soon as it comes up, the machines get killed.
          Is there any way to disable stonith BEFORE bringing hearbeat
          up?

        - any tips for preventing "deathmatch" situations? Surely after
          a reboot or two, it's safe to assume that more rebooting isn't
          going to improve the situation? Perhaps we have bad timeouts
          set or something?

        - heartbeat seems very keen to kill nodes. What's the rule for
          when nodes get killed? I would expect that timeout and failure
          to stop would justify stonithing, but a start/status failure?

        - the log is very verbose, but doesn't seem to include useful
          messages such as "running 'start' for 'X' on node 'Y'" or
          "killing node 'X' due to failure of 'status' op" etc. Am
          I missing the significant messages? Any way of filtering the
          basic set of "big event" messages from the rest?

Hope someone can provide some guidance here. Overall, we're finding heartbeat works very well, but the above issues are making life somewhat difficult.

cheers

--
-------------------------------------------------------------------
 Daniel Moore                              [EMAIL PROTECTED]
 SGI Australian Software Group
-------------------------------------------------------------------
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to