Hi Moni,

My systems are RHEL 5.1 x86-64, 2 Sinai hcas, fw 1.2.0

I setup bonding as follow:
IPOIBBOND_ENABLE=yes
IPOIB_BONDS=bond0
bond0_IP=11.1.1.1
bond0_SLAVEs=ib0,ib1
in /etc/infiniband/openib.conf in order to start ib-bond automatically

Testing with OFED-1.3-beta2, I got the following crash while system is booting up

Stack: ffffffff883429d0 fff810428519d30 ................
Call Trace:
[<ffffffff883429d0>] :bonding:bond_get_stats+0x4a/0x131
[<        8020e9cd>] rtnetlink_fill_ifinfo+0x4ba/0x5c4
             ee19>] rtmsg_if info+0x44/0x8d
             eea2>] rtnetlink_event+0x40/0x44
         8006492a>] notifier_call_chain+0x20/0x32
         80208b5e>] dev_open+0x68/0x6e
             72e8>] dev_change_flags+0x5a/0x119
         80239762>] devinet_ioctl+0x235/0x59c
         801ffcf6>] sock_ioctl+0x1c1/0x1e5
         8003fc3f>] do_ioctl+0x21/0x6b
         8002fa45>] vfs_ioctl+0x248/0261
         8004a24b>] sys_ioctl+0x59/0x78
         8005b14e>] system_call+0x7e/0x83

Code: Bad RIP value
RIP [0000000000000000000000] _stext+0x7ffff000/0x1000
RSP <ffff10428519cc0>
CR2: 000000000000000000000
<0>Kernel panic - not syncing: Fatal exception

I open bug #812 for this issue.

I moved our systems back to ofed-1.2.5.4 and tested ib-bond again. We tested it with ib0 and ib1 (connected to different switch/fabric) been on the same subnet (10.2.1.x, 255.255.255.0) and on different subnets (10.2.1.x and 10.3.1.x, 255.255.255.0). In both cases there is the issue of loosing communication between the servers if nodes have not been on the same primary ib interface.

Example:
1. original state: ib0's are the primary on both servers - pinging bond0 between the servers is fine 2. fail ib0 on one of the servers (ib1 become primary on this server) - pinging bond0 between the servers fails 3. fail ib0 on the second server (ib1 become primary) - pinging bond0 between the servers is fine again

Is this behavior expected?

thanks,
-vu
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to