Vu Pham wrote:
My systems are RHEL 5.1 x86-64, 2 Sinai hcas, fw 1.2.0
I setup bonding as follow:
IPOIBBOND_ENABLE=yes
IPOIB_BONDS=bond0
bond0_IP=11.1.1.1
bond0_SLAVEs=ib0,ib1
in /etc/infiniband/openib.conf in order to start ib-bond automatically
Hi Vu,
Please note that in RH5 there's a native support for bonding
configuration through the initscripts tools (network scripts, etc), see
section 3.1.2 at the ib-bonding.txt document provided with the bonding
package.
The persistency mechanism which you have used (eg through
/etc/init.d/openibd and /etc/openib.conf) is there only for somehow OLD
distributions for which there's no native (*) support for bonding
configuration, actually I was thinking we wanted to remove it
altogether, Moni?
(*) under RH4 the native support it broken for ipoib/bonding and hence
we patched the some initscripts scripts.
I moved our systems back to ofed-1.2.5.4 and tested ib-bond again. We
tested it with ib0 and ib1 (connected to different switch/fabric) been
on the same subnet (10.2.1.x, 255.255.255.0) and on different subnets
(10.2.1.x and 10.3.1.x, 255.255.255.0). In both cases there is the issue
of loosing communication between the servers if nodes have not been on
the same primary ib interface.
Generally speaking, I don't see the point in using bonding for
--high-availability-- where each slave is connected to different fabric.
This is b/c when there's fail-over in one system you need also the
second system to fail-over, you would also not be able to count on local
link detection mechanisms, since the remote node also must fail-over now
even with his local link being perfectly fine. This is correct
regardless of the interconnect type.
Am I missing something here regarding to your setup?
The question on usage case of bonding over separate fabrics have been
brought to me several times and I gave this answer, no-one ever tried to
educate me why its interesting, maybe you will do so...
Also what do you mean with "ib0 and ib1 been on the same/different
subnets" its only the master device (eg bond0, bond1, etc) with has
association/configuration with an IP subnet, correct?
1. original state: ib0's are the primary on both servers - pinging bond0
between the servers is fine
2. fail ib0 on one of the servers (ib1 become primary on this server) -
pinging bond0 between the servers fails
sure, b/c there's no reason for the remote bonding to issue fail-over
3. fail ib0 on the second server (ib1 become primary) - pinging bond0
between the servers is fine again
indeed.
Or.
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general