On Mon, Feb 25, 2008 at 4:26 PM, Johan Hoeke <[EMAIL PROTECTED]> wrote: > Andreas Mock wrote: > >> -----Urspr�ngliche Nachricht----- > >> Von: [EMAIL PROTECTED] > >> [mailto:[EMAIL PROTECTED] Im Auftrag von > >> Johan Hoeke > >> Gesendet: Montag, 25. Februar 2008 16:59 > >> An: General Linux-HA mailing list > >> Betreff: Re: [Linux-HA] Split brain after node reboot...argggh > >> > >> Maybe totally unrelated but just in case, > >> > >> I had a split brain situation on a 2 node cluster a little > >> while ago. I > >> posted the hbreport here, and Dejan concluded that sounded an > >> awful lot > >> like: > >> > >> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1768 > >> > >> and asked me to reopen that case and uploading the hb_report. > >> I'm in the > >> process of doing so now. > >> > >> See > >> http://www.mail-archive.com/[email protected]/msg06792.html > >> > >> I'm changing my nodes to not start heartbeat automatically > >> after reboot. > > > > > > Hi Johan, > > > > thank you for your reply. I checked the entries ans saw that > > I also got the "WARN: node A down". I really don't know why > > the upcoming node is not able to determine the status of > > the other node correctly. But probably this is the problem. > > > > Stonith-ing without waiting on the result seems really strange. > > A pitty for us that Andrew is on vacation. > > > > Best regards > > Andreas Mock > > Hi Andreas, > > Please take everything I say with a grain of salt because I'm in no way > an heartbeat expert! > > At the risk of further exposing my ignorance (stole that line from > someone on this list) I'll comment on your hb_report: > > I noticed from your report that you have quorum enabled for your two > node cluster. I recall reading that is best to not use quorum on a two > node cluster. Sorry, can't find the link just now. I saw a reference to > a twonode quorum module in your config file, so you might have that > covered. > > I had an issue with a bad iptables causing my nodes not to see the > other's heartbeat. Iptables became active on eth1 by mistake, where we > have our crossover heartbeat cable. Maybe a similar network problem has > come up on your site. > > And your logging reads at times as if the heartbeat nodes are defined as > pingd nodes as well: > > Feb 25 15:32:13 dis01 pingd: [10135]: notice: pingd_lstatus_callback: > Status update: Ping node dis02 now has status [dead] > > I'm sure I read somewhere that you're not supposed to use your cluster > nodes as ping nodes. I'm using the gateway as a pingd target. Maybe the > 62.146.40.161 address from your ha.cf is a gateway or something similar, > i can't tell. > > One last thing, and i'm repeating myself, I'm setting heartbeat so it > won't start on reboot. This because I pulled the heartbeat cable as a > test the other day, and the nodes took turns shooting each other after > rebooting. Dejan called it a shooting match. Classic mistake i guess :)
I am curious on this. Why would it be a classic mistake? It is only a mistake if there was some FAQ or guideline that everyone knew about that said not to do it. I have asked the same question (about having heartbeat start on boot) and the answers I received said that it is OK to do it. So now I am confused. I have not tried yanking on the heartbeat cable because I have heartbeat set up to go out both interfaces. If I did make it only one interface and yank the cable I could not use STONITH because at this point I can only use IPMI which requires eth0 to be available as it is ip addressable. Whe is it a good AND not so good situation to start heartbeat at boot? regards, Doug > regards, > > Johan > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- What profits a man if he gains the whole world yet loses his soul?
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
