We have a four box cluster that we just upgraded to RHEL5.4. This required an upgrade to the 1.5 version of OFED. We are using bonding over two physical links and ipoib. The final detail is that we are using IPv4 multicast to push data from 1 box to the other 3.

Under 1.4, this worked. (Yeah!)
Under 1.5, it doesn't.

By "not working" I mean:
 o IB is able to see the mesh.
 o IPv4 over the bond is working (I can ping, scp files, and similar)
 o Multicast does NOT.

When I looked closer, I can see that I get an error -22 on the multicast joins (using a qlogic switche's SM) for everything _except_ the broadcast join. I switched over to opensm, since it has far better debugging abilities and see the same behavior, though the error code is opensm logs a message with error 1B11.

When I look through for the code, I found that error code associated with an invalid set of component masks: Oct 20 12:40:05 824130 [44240940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet, scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID: ff12:601b:ffff::16 from port 0x0002c90300032431 (x3 HCA-1)

I looked through drivers/infiniband/ulp/ipoib/ipoib_multicast.c and found the following interesting bits: o The broadcast join is done with the presumption the broadcast groups already exist (and they do) o In ipoib_mcast_send() data path, ipoib_mcast_sendonly_join() is called directly (the multicast task is not used). This path, however, does not set the required component_mask bit to clear the 1B11 check (check_create_comp_mask())

I looked at the git log (from ofed_kernel_1_5) for ipoib_multicast.c and don't see any commits that would appear to be anywhere near this area.

Does anyone have any clue to what is going on here?  Thank you, --stuart

p.s. the output from the debugfs:
[r...@x3 ipoib]# pwd
/sys/kernel/debug/ipoib
[r...@ce-x3 ipoib]# more ib0_mcg
GID: ff12:401b:ffff:0:0:0:0:3a01
  created: 4295351581
  queuelen:         0
  complete:        no
  send_only:      yes

GID: ff12:401b:ffff:0:0:0:ffff:ffff
  created: 4295326209
  queuelen:         0
  complete:       yes
  send_only:       no

--
Stuart Stanley
M: 952-457-3790
[email protected]
--
"The avalanche has started. It is too late for the pebbles to vote." - Kosh in Babylon 5:"Believers"

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to