Andreas Mock wrote:
>> -----Urspr�ngliche Nachricht-----
>> Von: [EMAIL PROTECTED] 
>> [mailto:[EMAIL PROTECTED] Im Auftrag von 
>> Johan Hoeke
>> Gesendet: Montag, 25. Februar 2008 16:59
>> An: General Linux-HA mailing list
>> Betreff: Re: [Linux-HA] Split brain after node reboot...argggh
>>
>> Maybe totally unrelated but just in case,
>>
>> I had a split brain situation on a 2 node cluster a little 
>> while ago. I
>> posted the hbreport here, and Dejan concluded that sounded an 
>> awful lot
>> like:
>>
>> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1768
>>
>> and asked me to reopen that case and uploading the hb_report. 
>> I'm in the
>> process of doing so now.
>>
>> See 
>> http://www.mail-archive.com/[email protected]/msg06792.html
>>
>> I'm changing my nodes to not start heartbeat automatically 
>> after reboot.
> 
> 
> Hi Johan,
> 
> thank you for your reply. I checked the entries ans saw that
> I also got the "WARN: node A down". I really don't know why
> the upcoming node is not able to determine the status of
> the other node correctly. But probably this is the problem.
> 
> Stonith-ing without waiting on the result seems really strange.
> A pitty for us that Andrew is on vacation.
> 
> Best regards
> Andreas Mock

Hi Andreas,

Please take everything I say with a grain of salt because I'm in no way
an heartbeat expert!

At the risk of further exposing my ignorance (stole that line from
someone on this list) I'll comment on your hb_report:

I noticed from your report that you have quorum enabled for your two
node cluster. I recall reading that is best to not use quorum on a two
node cluster. Sorry, can't find the link just now. I saw a reference to
a twonode quorum module in your config file, so you might have that
covered.

I had an issue with a bad iptables causing my nodes not to see the
other's heartbeat. Iptables became active on eth1 by mistake, where we
have our crossover heartbeat cable. Maybe a similar network problem has
come up on your site.

And your logging reads at times as if the heartbeat nodes are defined as
pingd nodes as well:

Feb 25 15:32:13 dis01 pingd: [10135]: notice: pingd_lstatus_callback:
Status update: Ping node dis02 now has status [dead]

I'm sure I read somewhere that you're not supposed to use your cluster
nodes as ping nodes. I'm using the gateway as a pingd target. Maybe the
62.146.40.161 address from your ha.cf is a gateway or something similar,
i can't tell.

One last thing, and i'm repeating myself, I'm setting heartbeat so it
won't start on reboot. This because I pulled the heartbeat cable as a
test the other day, and the nodes took turns shooting each other after
rebooting. Dejan called it a shooting match. Classic mistake i guess :)

regards,

Johan



Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to