Hi Sasha, Yes, the problem seems to appear only when there is an SM migration. I receive in-service notices for other ports, as long as there is no SM migration occurring.
Thanks, Lan On 7/26/07, Sasha Khapyorsky <[EMAIL PROTECTED]> wrote: > > On 12:37 Thu 26 Jul , lbt wrote: > > Thanks for the suggestion Sasha! > > > > Our host stack does receive a rereregistration notice and does > resubscribe > > all handlers at > > that point in time. At the time of the SM migration, our stack prints > out > > some informational messages to > > confirm this: > > Jul 18 14:31:09 localhost kernel: Event IB_EVENT_CLIENT_REREGISTER > occurred > > on port 1 > > Jul 18 14:31:09 localhost kernel: OpemSM migrated, old SM LID=1 new SM > LID=8 > > > > And also confirmed in the SM logs that after the migration, the higher > > priority SM is getting a subscription request for in-service trap: > > Jul 18 14:32:13 103550 [41E02960] -> osm_infr_rcv_process_set_method: > > Subscribe Request with QPN: 0x000001 > > Jul 18 14:32:13 103554 [41E02960] -> osm_infr_get_by_rec: [ > > Jul 18 14:32:13 103558 [41E02960] -> __dump_all_informs: [ > > Jul 18 14:32:13 103562 [41E02960] -> InformInfo dump: > > > gid.....................0x0000000000000000 : > > 0x0000000000000000 > > lid_range_begin.........0xFFFF > > lid_range_end...........0x0 > > is_generic..............0x1 > > subscribe...............0x0 > > trap_type...............0x3 > > trap_num................64 > > qpn.....................0x000001 > > resp_time_val...........0x0 > > node_type...............0x000004 > > Jul 18 14:32:13 103569 [41E02960] -> __dump_all_informs: ] > > > > It maybe a problem if the resubscription of the in-service handler > occurs > > after the in-service notice was forwarded, but I think the problem is > that > > there is never a notice that is forwared for the higher priority SM > port > > that is restored. > > And after OpenSM migration, did you receive in-service notices for > another ports? Does the problem happen only in migration time? > > > Perhaps, neither SM (the lower priority and higher > > priority one), generates an in-service trap because of the timing gap > > between when the restored port is detected and "marked" (i.e. added to > > new_ports_list) and when in-service traps are generated for new ports. > > During SM migration, the lower priority SM detects the new port, but > the > > higher priority SM does the trap generation (but it doesn't realize > that > > it's own port is a new port and thus doesn't generate a trap for it). > > > > Our host stack executes some functions when a port is restored (in our > > in-service subscription handler). > > Am I not supposed to receive an in-service trap for a restored port > that > > happens to be the Master SM, > > Yes, I guess you are. > > > and instead execute these actions with a > > client reregistration event? > > Client reregistration request is not suitable here - SM can ask for > client reregistration at any time (in practice OpenSM now does it only > when enters MASTER state, but it is also optional). > > Sasha > > > > > Thanks again for your help! > > Lan > > > > > > > > On 7/25/07, Sasha Khapyorsky <[EMAIL PROTECTED]> wrote: > > > > > > Hi Lan, > > > > > > On 09:57 Wed 25 Jul , lbt wrote: > > > > Hello, > > > > > > > > I have been seeing a problem where a subscriber for in-service > traps is > > > not > > > > getting informed when the port of master openSM is restored (i.e. > > > causing an > > > > SM migration). > > > > > > > > I have an IB subnet with 2 nodes running OpenSM , different > priorities > > > of > > > > course (OpenSM Rev:openib-2.0.5). I also have another node on the > > > subnet > > > > that has subscribed for the forwarding of any > > > IB_SA_GENERIC_TRAP_NUM_IN_SVC > > > > trap events. I've been doing cable pull tests on the IB ports, to > check > > > if > > > > the in-service handler I have subscribed gets invoked when I > restore > > > the > > > > cable. I've noticed that everything works as expected ( i.e. my > > > in-service > > > > handler is invoked) whenever I restore the cable on the lower > priority > > > SM IB > > > > port without ever touching the master SM port. But if I cause an SM > > > > migration, by restoring the port of the higher priority SM, the > > > in-service > > > > trap does not get generated as expected on a cable restore. > > > > > > > > Steps to Reproduce: > > > > 1) Start with port to higher priority SM disconnected. > > > > 2) restore port cable on the higher priority SM > > > > --> This causes an SM Migration as expected, SM's migration happens > > > okay > > > > --> I expected the restoration of the higher priority SM to tit to > also > > > > trigger an in-service trap as well and notify subscribers, but it > > > doesn't > > > > occur > > > > > > > > I have collected debug messages log for both open SM's, and it > appears > > > that > > > > the reason is because: > > > > 1) in-service traps are generated based on what ports are added on > the > > > > Master SM's new_ports_list, but these traps are generated only > after > > > LID > > > > assignment > > > > 2) when the higher priority SM port is restored, the restored port > gets > > > > added to the lower priority SM's new_ports_list (since it's still > the > > > Master > > > > SM at that point in time) > > > > 3) the handover of Master SM from lower priority to higher > priority > > > SM > > > > occurs (before LID assignment and thus a chance for traps get > generated > > > for > > > > those ports on new_ports_list) > > > > 4) the higher priority SM is now Master SM, but it has an empty > > > > new_ports_list, so no trap generated either > > > > > > > > Does this look like a legitimate Open SM bug? Any feedback would be > > > much > > > > appreciated, and if I can help further in any way please let me > know . > > > > > > As far as I know when OpenSM (even old like 2.0.5) becomes master it > > > requests client to reregister SA related stuff (by setting this bit in > > > PortInfo). > > > > > > Probably your port doesn't not support this (you could verify by > seeing > > > PortInfo:CapabilityMask - use 'smpquery portinfo <client-port-lid>') > or > > > maybe your host stack doesn't do reregistration? > > > > > > Anyway you could track this in the OpenSM code in osm_lid_mgr.c > > > __osm_lid_mgr_set_physp_pi() whenever client reregistration bit is set > > > (with ib_port_info_set_client_rereg()) or not. Then we will know more > > > about this problem. > > > > > > Sasha > > > > > > > > > > > > > > > Subset of logs from lower priority SM during the cable restore of > > > higher > > > > priority SM port: > > > > ### Jul 18 14:31:56 614522 [41401960] -> > > > __osm_trap_rcv_process_request: > > > > Received Generic Notice type:0x03 num:128 Producer:2 from > LID:0x000A > > > > TID:0x00000016000012e1 > > > > ### Jul 18 14:31:56 614823 [41401960] -> osm_state_mgr_process: > > > Received > > > > signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_IDLE > > > > ### 14:31:56 ******************** INITIATING HEAVY SWEEP > > > > ********************** > > > > ### Jul 18 14:31:56 616887 [42803960] -> osm_state_mgr_process: > > > Received > > > > signal OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state > > > > OSM_SM_STATE_SWEEP_HEAVY_SELF > > > > Jul 18 14:31:56 626078 [42803960] -> __osm_ni_rcv_process_new: > Adding > > > port > > > > GUID:0x00504501483e0000 to new_ports_list > > > > Jul 18 14:31:56 626524 [42803960] -> osm_state_mgr_process: > Received > > > signal > > > > OSM_SIGNAL_CHANGE_DETECTED in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET > > > > Jul 18 14:31:56 632630 [41E02960] -> osm_state_mgr_process: > Received > > > signal > > > > OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state > > > OSM_SM_STATE_SWEEP_HEAVY_SUBNET > > > > 14:31:56 ********************* HEAVY SWEEP COMPLETE > > > *********************** > > > > Jul 18 14:31:56 632773 [41E02960] -> osm_sm_state_mgr_process: > Received > > > > signal OSM_SM_SIGNAL_HANDOVER_SENT in state > IB_SMINFO_STATE_MASTER### > > > > 14:31:56 ******************** ENTERING SM STANDBY STATE > > > ******************* > > > > > > > > Subset of logs from higher priority SM during the cable restore of > > > higher > > > > priority SM port: > > > > > > > > Jul 18 14:32:02 995600 [41401960] -> osm_sm_state_mgr_process: [ > > > > Jul 18 14:32:02 995605 [41401960] -> osm_sm_state_mgr_process: > Received > > > > signal OSM_SM_SIGNAL_DISCOVERY_COMPLETED in state > > > > IB_SMINFO_STATE_DISCOVERING > > > > Jul 18 14:32:02 995609 [41401960] -> Entering MASTER state > > > > Jul 18 14:32:02 995888 [41401960] -> __osm_sm_state_mgr_master_msg: > > > > ******************** ENTERING SM MASTER STATE ******************** > > > > Jul 18 14:32:03 009014 [41401960] -> > > > __osm_state_mgr_set_sm_lid_done_msg: > > > > **** SM LID ASSIGNMENT COMPLETE - STARTING SUBNET LID CONFIG ***** > > > > Jul 18 14:32:03 024047 [41E02960] -> __osm_state_mgr_lid_assign_msg > > > > ***** LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE CONFIG ***** > > > > Jul 18 14:32:03 024052 [41E02960] -> > __osm_state_mgr_report_new_ports: > > > [ > > > > ----> no in-service traps are generated and notices forwarded > because > > > there > > > > are no ports on this list > > > > Jul 18 14:32:03 024057 [41E02960] -> > __osm_state_mgr_report_new_ports: > > > ] > > > > > > > > > > > > Thanks! > > > > Lan > > > > > > > _______________________________________________ > > > > general mailing list > > > > [email protected] > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > >
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
