On Wed, 2007-04-11 at 05:49, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock <[EMAIL PROTECTED]>: > > Subject: Re: multicast join failed for... > > > > On Mon, 2007-04-09 at 18:47, Egor Tur wrote: > > > Hi folk. > > > > > > > > ib1: multicast join failed for > > > > > ff12:601b:ffff:0000:0000:0000:0000:0001, status -22 > > > > > ib0: multicast join failed for > > > > > ff12:601b:ffff:0000:0000:0000:0000:0001, status -22 > > > > > > > > > > And in osm.log: > > > > > Apr 09 21:33:50 658439 [42003960] -> __osm_mcmr_rcv_join_mgrp: ERR > > > > > 1B12: __validate_more_comp_fields, > > > > > __validate_port_caps, or JoinState = 0 failed from port > > > > > 0x001708ffffd15099 (HP Lion Cub DDR 128MB), > > > > > sending IB_SA_MAD_STATUS_REQ_INVALID > > > > > > > > > OpenSM ERR 1B12 means that the rate or MTU of the port was incompatible > > > > with the MC group. You could turn on -V with OpenSM and see more log > > > > messages as to what is going on wrong from the SM's perspective. > > > > > > Ok. This from osm.log with -V : > > > > > > Apr 10 00:56:06 390007 [44007960] -> __osm_sa_mad_ctrl_process: [ > > > > > > Apr 10 00:56:06 390016 [44007960] -> __osm_sa_mad_ctrl_process: Posting > > > Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD > > > Apr 10 00:56:06 390027 [44007960] -> __osm_sa_mad_ctrl_process: ] > > > > > > Apr 10 00:56:06 390033 [44007960] -> __osm_sa_mad_ctrl_rcv_callback: ] > > > > > > Apr 10 00:56:06 390046 [41001960] -> osm_mcmr_rcv_process: [ > > > > > > Apr 10 00:56:06 390054 [41001960] -> __osm_mcmr_rcv_join_mgrp: [ > > > > > > Apr 10 00:56:06 390060 [41001960] -> __osm_mcmr_rcv_join_mgrp: Dump of > > > incoming record > > > Apr 10 00:56:06 390065 [41001960] -> MCMember Record dump: > > > > > > > > > MGID....................0xff12601bffff0000 : 0x0000000000000001 > > > > > > > > > PortGid.................0xfe80000000000000 : 0x001708ffffd1509a > > > > > > qkey....................0xB1B > > > > > > mlid....................0x0 > > > > > > mtu.....................0x84 > > > > > > TClass..................0x0 > > > > > > pkey....................0xFFFF > > > > > > rate....................0x83 > > > > > > pkt_life................0x0 > > > > > > SLFlowLabelHopLimit.....0x0 > > > > > > ScopeState..............0x1 > > > > > > ProxyJoin...............0x0 > > > > > > Apr 10 00:56:06 390084 [41001960] -> __validate_more_comp_fields: > > > Requested RATE 6 is not equal to 3 > > > > Rate 6 is 20 Gb/sec whereas 3 is 10 Gb/sec. So the port is 4x DDR (rate > > 6) and the group is 4x SDR. The request is for equal to the rate so it > > fails. > > > BTW, the only reason I know for IPoIB to request a specific rate > is if the broadcast multicast group has that rate. Roland, is that right? > > So, how come the broadcast multicast group has rate DDR, but a specific > group has lower rate?
Why does this IPoIB client think that the broadcast group is 4x DDR (20 Gbps) when the SM thinks it is 4x SDR (10 Gbps) ? How could that happen ? Is this a porting issue somehow ? -- Hal _______________________________________________ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
