On Mon, Feb 25, 2008 at 4:26 PM, Johan Hoeke <[EMAIL PROTECTED]> wrote:
> Andreas Mock wrote:
>  >> -----Urspr�ngliche Nachricht-----
>  >> Von: [EMAIL PROTECTED]
>  >> [mailto:[EMAIL PROTECTED] Im Auftrag von
>  >> Johan Hoeke
>  >> Gesendet: Montag, 25. Februar 2008 16:59
>  >> An: General Linux-HA mailing list
>  >> Betreff: Re: [Linux-HA] Split brain after node reboot...argggh
>  >>
>  >> Maybe totally unrelated but just in case,
>  >>
>  >> I had a split brain situation on a 2 node cluster a little
>  >> while ago. I
>  >> posted the hbreport here, and Dejan concluded that sounded an
>  >> awful lot
>  >> like:
>  >>
>  >> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1768
>  >>
>  >> and asked me to reopen that case and uploading the hb_report.
>  >> I'm in the
>  >> process of doing so now.
>  >>
>  >> See
>  >> http://www.mail-archive.com/[email protected]/msg06792.html
>  >>
>  >> I'm changing my nodes to not start heartbeat automatically
>  >> after reboot.
>  >
>  >
>  > Hi Johan,
>  >
>  > thank you for your reply. I checked the entries ans saw that
>  > I also got the "WARN: node A down". I really don't know why
>  > the upcoming node is not able to determine the status of
>  > the other node correctly. But probably this is the problem.
>  >
>  > Stonith-ing without waiting on the result seems really strange.
>  > A pitty for us that Andrew is on vacation.
>  >
>  > Best regards
>  > Andreas Mock
>
>  Hi Andreas,
>
>  Please take everything I say with a grain of salt because I'm in no way
>  an heartbeat expert!
>
>  At the risk of further exposing my ignorance (stole that line from
>  someone on this list) I'll comment on your hb_report:
>
>  I noticed from your report that you have quorum enabled for your two
>  node cluster. I recall reading that is best to not use quorum on a two
>  node cluster. Sorry, can't find the link just now. I saw a reference to
>  a twonode quorum module in your config file, so you might have that
>  covered.
>
>  I had an issue with a bad iptables causing my nodes not to see the
>  other's heartbeat. Iptables became active on eth1 by mistake, where we
>  have our crossover heartbeat cable. Maybe a similar network problem has
>  come up on your site.
>
>  And your logging reads at times as if the heartbeat nodes are defined as
>  pingd nodes as well:
>
>  Feb 25 15:32:13 dis01 pingd: [10135]: notice: pingd_lstatus_callback:
>  Status update: Ping node dis02 now has status [dead]
>
>  I'm sure I read somewhere that you're not supposed to use your cluster
>  nodes as ping nodes. I'm using the gateway as a pingd target. Maybe the
>  62.146.40.161 address from your ha.cf is a gateway or something similar,
>  i can't tell.
>
>  One last thing, and i'm repeating myself, I'm setting heartbeat so it
>  won't start on reboot. This because I pulled the heartbeat cable as a
>  test the other day, and the nodes took turns shooting each other after
>  rebooting. Dejan called it a shooting match. Classic mistake i guess :)

I am curious on this.  Why would it be a classic mistake?  It is only
a mistake if there was some FAQ or guideline that everyone knew about
that said not to do it.  I have asked the same question (about having
heartbeat start on boot) and the answers I received said that it is OK
to do it.
So now I am confused.  I have not tried yanking on the heartbeat cable
because I have heartbeat set up to go out both interfaces.  If I did
make it only one interface and yank the cable I could not use STONITH
because at this point I can only use IPMI which requires eth0 to be
available as it is ip addressable.

Whe is it a good AND not so good situation to start heartbeat at boot?

regards,

Doug



>  regards,
>
>  Johan
>
>
>
>
> _______________________________________________
>  Linux-HA mailing list
>  [email protected]
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  See also: http://linux-ha.org/ReportingProblems
>



-- 
What profits a man if he gains the whole world yet loses his soul?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to