Re: [Linux-HA] reconfiguring network interfaces causes split brain

Dejan Muhamedagic Wed, 26 Sep 2007 03:45:20 -0700

Hi,

On Wed, Sep 26, 2007 at 10:51:19AM +0200, Raoul Bhatia [IPAX] wrote:
> hello,
> 
> ill try to keep things short so please do not consider it rude:
> 
> 2 (debian 4.0) nodes: eth0 = external; eth1 = hb channel
> 
> the cluster has been in the state:
> 
> >Current DC: webcluster02 (917954cd-0285-4fcd-9cd2-671736c4de66)
> >2 Nodes configured.
> > ...
> >Node: webcluster01 (49e81295-8e2f-4aeb-98f3-a14de6f62298): online
> >Node: webcluster02 (917954cd-0285-4fcd-9cd2-671736c4de66): online
> 
> on webcluster01 i issued /etc/init.d/networking restart causing:
> >Sep 26 10:33:59 webcluster01 kernel: [78706.462355] tg3: eth1: Link is 
> >down.
> >Sep 26 10:34:02 webcluster01 kernel: [78708.758368] tg3: eth1: Link is up 
> >at 1000 Mbps, full duplex.
> >Sep 26 10:34:02 webcluster01 kernel: [78708.764028] tg3: eth1: Flow 
> >control is on for TX and on for RX.
> >Sep 26 10:34:29 webcluster01 heartbeat: [31919]: WARN: node webcluster02: 
> >is dead
> >Sep 26 10:34:29 webcluster01 heartbeat: [31919]: info: Link 
> >webcluster02:eth1 dead.
> >Sep 26 10:34:29 webcluster01 crmd: [31937]: notice: 
> >crmd_ha_status_callback: Status update: Node webcluster02 now has status 
> >[dead]
> >Sep 26 10:34:29 webcluster01 ccm: [31932]: info: Break tie for 2 nodes 
> >cluster
> 
> now, crm_mon is in a split-brain situation:
> 
> webcluster01:
> >Current DC: webcluster01 (49e81295-8e2f-4aeb-98f3-a14de6f62298)
> >2 Nodes configured.
> >...
> >Node: webcluster01 (49e81295-8e2f-4aeb-98f3-a14de6f62298): online
> >Node: webcluster02 (917954cd-0285-4fcd-9cd2-671736c4de66): OFFLINE
>                                                              ^^^^^^^
> 
> webcluster02:
> > Current DC: webcluster02 (917954cd-0285-4fcd-9cd2-671736c4de66)
> > 2 Nodes configured.
> > ...
> > Node: webcluster01 (49e81295-8e2f-4aeb-98f3-a14de6f62298): online
> > Node: webcluster02 (917954cd-0285-4fcd-9cd2-671736c4de66): online
>                                                              ^^^^^^
> 
> Q: how do i resolve this issue without restarting heartbeat?


There is probably no other reliable way. Actually, most of the
time Heartbeat recovers from split brain and it should, but
there seems to be a bug/deficiency in algorithm which sometimes
leaves the cluster in a state similar to the one you encountered.
See
http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1546
for more details.

> shouldn't
> there be a check to avoid this kind of split-brain situation?

What do you mean by "this kind"? BTW, split brain conditions
should be avoided at any cost. That doesn't mean however that we
shouldn't do our best to recover from them.

> do you need any further information?

You could attach the logs to the bugzilla file.

Thanks.

Dejan

> cheers,
> raoul bhatia
> 
> ps: i do not use stonith yet as i do not want stonith to interfere with
> configuration errors :)
> -- 
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc.          email.          [EMAIL PROTECTED]
> Technischer Leiter
> 
> IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
> Barawitzkagasse 10/2/2/11           email.            [EMAIL PROTECTED]
> 1190 Wien                           tel.               +43 1 3670030
> FN 277995t HG Wien                  fax.            +43 1 3670030 15
> ____________________________________________________________________
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] reconfiguring network interfaces causes split brain

Reply via email to