Vu Pham wrote:
My systems are RHEL 5.1 x86-64, 2 Sinai hcas, fw 1.2.0
I setup bonding as follow:
IPOIBBOND_ENABLE=yes
IPOIB_BONDS=bond0
bond0_IP=11.1.1.1
bond0_SLAVEs=ib0,ib1
in /etc/infiniband/openib.conf in order to start ib-bond automatically

Hi Vu,

Please note that in RH5 there's a native support for bonding configuration through the initscripts tools (network scripts, etc), see section 3.1.2 at the ib-bonding.txt document provided with the bonding package.

The persistency mechanism which you have used (eg through /etc/init.d/openibd and /etc/openib.conf) is there only for somehow OLD distributions for which there's no native (*) support for bonding configuration, actually I was thinking we wanted to remove it altogether, Moni?

(*) under RH4 the native support it broken for ipoib/bonding and hence we patched the some initscripts scripts.

I moved our systems back to ofed-1.2.5.4 and tested ib-bond again. We tested it with ib0 and ib1 (connected to different switch/fabric) been on the same subnet (10.2.1.x, 255.255.255.0) and on different subnets (10.2.1.x and 10.3.1.x, 255.255.255.0). In both cases there is the issue of loosing communication between the servers if nodes have not been on the same primary ib interface.

Generally speaking, I don't see the point in using bonding for --high-availability-- where each slave is connected to different fabric. This is b/c when there's fail-over in one system you need also the second system to fail-over, you would also not be able to count on local link detection mechanisms, since the remote node also must fail-over now even with his local link being perfectly fine. This is correct regardless of the interconnect type.

Am I missing something here regarding to your setup?

The question on usage case of bonding over separate fabrics have been brought to me several times and I gave this answer, no-one ever tried to educate me why its interesting, maybe you will do so...

Also what do you mean with "ib0 and ib1 been on the same/different subnets" its only the master device (eg bond0, bond1, etc) with has association/configuration with an IP subnet, correct?

1. original state: ib0's are the primary on both servers - pinging bond0 between the servers is fine

2. fail ib0 on one of the servers (ib1 become primary on this server) - pinging bond0 between the servers fails
sure, b/c there's no reason for the remote bonding to issue fail-over

3. fail ib0 on the second server (ib1 become primary) - pinging bond0 between the servers is fine again
indeed.

Or.

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to