Re: [DRBD-user] switch was down, all drbd machines rebootet

Scott Inderlied Fri, 03 Jul 2009 18:54:57 -0700

Now you know why the bottom of 
http://www.drbd.org/users-guide/ch-heartbeat.html is so emphatic about having 
two _independent_ channels not dependent on a SPOF, such as your switch.


'Heartbeat communication channels
        Heartbeat uses a UDP-based communication protocol to periodically check 
for node availability (the "heartbeat" proper).         For this purpose, 
Heartbeat can use several communication methods, including: 
        • IP multicast,
        • IP broadcast,
        • IP unicast,
        • serial line.
Of these, IP multicast and IP broadcast are the most relevant in practice. The 
absolute minimum requirement for stable cluster operation is _two independent_ 
communication channels. '


From: [email protected] 
[mailto:[email protected]] On Behalf Of Heiko
Sent: Friday, July 03, 2009 09:01
To: [email protected]
Subject: [DRBD-user] switch was down, all drbd machines rebootet

Hello,

i had an earlier discussion here where we came to the conclusion that using 
Protocol C can cause crashes.
Yesterday we had problems with one of our switches and therefore the drbd 
enabled machines couldnt see each other,
than all the machines did reboots, created splitbrains and a lot of work.
Do you think the crashes/reboots are caused by the same problem or can we 
prevent this behavouir by optimizing our
heartbeat drbd config? Ill attach a drbd config and the ha.cf

---------------------------------
drbd.conf

common {
  protocol C;
}



resource drbd_backend {
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   detach;
  }
  net {
  }
  syncer {
        rate 500M;
        al-extents 257;
  }

  on xen-B1.fra1.mailcluster {
    device    /dev/drbd0;
    disk      /dev/md3;
    address   172.20.2.1:7788;
    meta-disk internal;
  }
  on xen-A1.fra1.mailcluster {
    device    /dev/drbd0;
    disk      /dev/md3;
    address   172.20.1.1:7788;
    meta-disk internal;
  }
}

---------------------------------------
ha.cf

#use_logd on
logfile /var/log/ha-log
debugfile /var/log/ha-debug
logfacility local0
keepalive 2
deadtime 10
warntime 3
initdead 20
udpport 694
ucast eth0 172.20.1.1
ucast eth0 172.20.2.1
node xen-A1.fra1.mailcluster
node xen-B1.fra1.mailcluster
auto_failback on



thnx a lot


.r

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] switch was down, all drbd machines rebootet

Reply via email to