Sasha, I am running OFED 1.3.1. My SN Manager is opensmd. /var/log/opensm.log shows the following:
Sep 19 14:21:19 480217 [43806960] 0x02 -> SUBNET UP Sep 19 14:21:19 818276 [41001960] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 (Channel Adapter) from LID:0x0011 TID:0x0000000000000000 Sep 19 14:21:19 818330 [41001960] 0x02 -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0011 GID:0xfe80000000000000,0x0002c9020027d451 Sep 19 14:21:19 823408 [43806960] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches Sep 19 14:21:19 827220 [43806960] 0x02 -> SUBNET UP Sep 19 14:21:27 283873 [41802960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending IB_SA_MAD_STATUS_REQ_INVALID Sep 19 14:21:43 298367 [42804960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending IB_SA_MAD_STATUS_REQ_INVALID Sep 19 14:21:59 312765 [42003960] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR 1B12: __validate_more_comp_fields, __validate_port_caps, or JoinState = 0 failed from port 0x0002c9020026e4c1 ( HCA-1), sending IB_SA_MAD_STATUS_REQ_INVALID Rebooting the node that failed to join the group always seems to solve the problem. Thanks for your help. -Roger > -----Original Message----- > From: Sasha Khapyorsky [mailto:[EMAIL PROTECTED] > Sent: Friday, September 19, 2008 1:06 PM > To: Roger Spellman > Cc: [email protected] > Subject: Re: [ofa-general] Intermittent: ib0: multicast join failed > > On 16:45 Thu 18 Sep , Roger Spellman wrote: > > I have many nodes, each with a Mellanox MT25204. When I reboot some > > nodes, they occasionally get the following error: > > > > ib0: multicast join failed > > What is the software stack? Which version? > > > Rebooting the system almost always solves this problem. > > > > What causes this? > > What are SM you using? If it is OpenSM you can see in the log > (/vat/log/opensm.log) why the join failed. > > > Is there a way to solve this without rebooting? > > Hard to say - the reason for failure is unknown. I could be port's low > speed/width or something else, hard to say without any details. > > Sasha _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
