On Wed, 2007-04-11 at 05:49, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > Subject: Re: multicast join failed for... > > > > On Mon, 2007-04-09 at 18:47, Egor Tur wrote: > > > Hi folk. > > > > > > > > ib1: multicast join failed for > > > > > ff12:601b:ffff:0000:0000:0000:0000:0001, status -22 > > > > > ib0: multicast join failed for > > > > > ff12:601b:ffff:0000:0000:0000:0000:0001, status -22 > > > > > > > > > > And in osm.log: > > > > > Apr 09 21:33:50 658439 [42003960] -> __osm_mcmr_rcv_join_mgrp: ERR > > > > > 1B12: __validate_more_comp_fields, > > > > > __validate_port_caps, or JoinState = 0 failed from port > > > > > 0x001708ffffd15099 (HP Lion Cub DDR 128MB), > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > > > > > OpenSM ERR 1B12 means that the rate or MTU of the port was incompatible > > > > with the MC group. You could turn on -V with OpenSM and see more log > > > > messages as to what is going on wrong from the SM's perspective. > > > > > > Ok. This from osm.log with -V : > > > > > > Apr 10 00:56:06 390007 [44007960] -> __osm_sa_mad_ctrl_process: [ > > > > > > Apr 10 00:56:06 390016 [44007960] -> __osm_sa_mad_ctrl_process: Posting > > > Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD > > > Apr 10 00:56:06 390027 [44007960] -> __osm_sa_mad_ctrl_process: ] > > > > > > Apr 10 00:56:06 390033 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ] > > > > > > Apr 10 00:56:06 390046 [41001960] -> osm_mcmr_rcv_process: [ > > > > > > Apr 10 00:56:06 390054 [41001960] -> __osm_mcmr_rcv_join_mgrp: [ > > > > > > Apr 10 00:56:06 390060 [41001960] -> __osm_mcmr_rcv_join_mgrp: Dump of > > > incoming record > > > Apr 10 00:56:06 390065 [41001960] -> MCMember Record dump: > > > > > > > > > MGID....................0xff12601bffff0000 : 0x0000000000000001 > > > > > > > > > PortGid.................0xfe80000000000000 : 0x001708ffffd1509a > > > > > > qkey....................0xB1B > > > > > > mlid....................0x0 > > > > > > mtu.....................0x84 > > > > > > TClass..................0x0 > > > > > > pkey....................0xFFFF > > > > > > rate....................0x83 > > > > > > pkt_life................0x0 > > > > > > SLFlowLabelHopLimit.....0x0 > > > > > > ScopeState..............0x1 > > > > > > ProxyJoin...............0x0 > > > > > > Apr 10 00:56:06 390084 [41001960] -> __validate_more_comp_fields: > > > Requested RATE 6 is not equal to 3 > > > > Rate 6 is 20 Gb/sec whereas 3 is 10 Gb/sec. So the port is 4x DDR (rate > > 6) and the group is 4x SDR. The request is for equal to the rate so it > > fails. > > > BTW, the only reason I know for IPoIB to request a specific rate > is if the broadcast multicast group has that rate. Roland, is that right?
The IPoIB RFC says that non broadcast multicast groups must use the same parameters as those used in the broadcast group. It also looks to me that is what is implemented in the code. > So, how come the broadcast multicast group has rate DDR, but a specific > group has lower rate? I think the error is not that the broadcast and some nonbroadcast groups have different rates but that the SM is rejecting the join for a DDR port to an SDR group. I wonder whether the broadcast group was formed properly (and asked about that but haven't heard back yet). -- Hal _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
