----- Original Message -----
> From: "Arnold Krille" <[email protected]>
> To: [email protected]
> Sent: Wednesday, June 13, 2012 3:04:04 PM
> Subject: Re: [DRBD-user] Corosync Configuration
> 
> On 13.06.2012 17:56, William Seligman wrote:
> > A data point:
> > 
> > On my cluster, I have two dedicated direct-link cables between the
> > two nodes,
> > one for DRBD traffic, the other for corosync/pacemaker traffic.
> > Roughly once per
> > week, I get a "link down" messages on one of the nodes:
> 
> A) use several communication-rings in corosync. We use one on the
> regular user-network and a second on the storage-network. One fails,
> no
> problem, corosync doesn't sense a need to fence something.
> B) use bonded/bridged interfaces for the storage-connection. We
> currently have our storage-network aka vlan17 as vlan on eth0 of all
> the
> servers and untagged on eth1, using a bond with active-backup mode
> where
> eth1 is the primary and vlan17 the backup.
> 
> With these two I didn't even realize my boss unplugged the
> network-cables of one of our servers one by one. Neither did drbd
> feel
> any glitch nor did the cluster feel a need to move/kill/fence
> anything.
> And a 5 second hang for the x2go-sessions on one of the machines
> doesn't
> matter when everyone is on break.
> 
> I haven't yet figured out how to build the bridges/bonds when all the
> servers have 4 nics. But that isn't a real problem until I also did
> functionality tests with two (or three) new switches.
> I think I will do one bridge of two ports with rstp for the normal
> user
> network and one bridge of two ports with rstp for the
> storage-network.
> Then skip the active-backup bonding and see that rstp manages to find
> the paths. Of course this wouldn't necessarily improve throughput
> between two nodes, but throughput from one node to two nodes would
> probably be higher.
> Or I extend my current setup and instead of eth0 and eth1 I use one
> pair
> of bonded ports each. Which would give me a total of three bonds per
> server, two one of the 'real' modes and one in active-backup mode...
> 

This is whole thing is somewhat off-topic for DRBD and we should probably move 
the thread to the pacemaker mailing list but since no one has complained I'll 
chime in with our setup :-)

We have a pair of bonded nics in each server using round robin directly 
connected to each other for drbd sync/corosync first ring.

Then we have a second pair of nics on our regular network.  These are using 
802.3ad link aggregation through a HP 5412 chassis switch.  That switch 
supports link aggregation between seperate switch modules so each of the two 
nics on each server are connected to different modules giving us greater fault 
tolerance as well as aggregation.  We run our second corosync ring on that bond.

This config theoretically gives us the potential of 3 nic/cable failures 
without requiring a STONITH.
We've not had a STONITH event due to a link problem since we got the setup 
running (over 9 months). 

Jake
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to