Hello, I have been seeing a problem where a subscriber for in-service traps is not getting informed when the port of master openSM is restored (i.e. causing an SM migration).
I have an IB subnet with 2 nodes running OpenSM , different priorities of course (OpenSM Rev:openib-2.0.5). I also have another node on the subnet that has subscribed for the forwarding of any IB_SA_GENERIC_TRAP_NUM_IN_SVC trap events. I've been doing cable pull tests on the IB ports, to check if the in-service handler I have subscribed gets invoked when I restore the cable. I've noticed that everything works as expected ( i.e. my in-service handler is invoked) whenever I restore the cable on the lower priority SM IB port without ever touching the master SM port. But if I cause an SM migration, by restoring the port of the higher priority SM, the in-service trap does not get generated as expected on a cable restore. Steps to Reproduce: 1) Start with port to higher priority SM disconnected. 2) restore port cable on the higher priority SM --> This causes an SM Migration as expected, SM's migration happens okay --> I expected the restoration of the higher priority SM to tit to also trigger an in-service trap as well and notify subscribers, but it doesn't occur I have collected debug messages log for both open SM's, and it appears that the reason is because: 1) in-service traps are generated based on what ports are added on the Master SM's new_ports_list, but these traps are generated only after LID assignment 2) when the higher priority SM port is restored, the restored port gets added to the lower priority SM's new_ports_list (since it's still the Master SM at that point in time) 3) the handover of Master SM from lower priority to higher priority SM occurs (before LID assignment and thus a chance for traps get generated for those ports on new_ports_list) 4) the higher priority SM is now Master SM, but it has an empty new_ports_list, so no trap generated either Does this look like a legitimate Open SM bug? Any feedback would be much appreciated, and if I can help further in any way please let me know . Subset of logs from lower priority SM during the cable restore of higher priority SM port: ### Jul 18 14:31:56 614522 [41401960] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x03 num:128 Producer:2 from LID:0x000A TID:0x00000016000012e1 ### Jul 18 14:31:56 614823 [41401960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_SWEEP in state OSM_SM_STATE_IDLE ### 14:31:56 ******************** INITIATING HEAVY SWEEP ********************** ### Jul 18 14:31:56 616887 [42803960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state OSM_SM_STATE_SWEEP_HEAVY_SELF Jul 18 14:31:56 626078 [42803960] -> __osm_ni_rcv_process_new: Adding port GUID:0x00504501483e0000 to new_ports_list Jul 18 14:31:56 626524 [42803960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_CHANGE_DETECTED in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET Jul 18 14:31:56 632630 [41E02960] -> osm_state_mgr_process: Received signal OSM_SIGNAL_NO_PENDING_TRANSACTIONS in state OSM_SM_STATE_SWEEP_HEAVY_SUBNET 14:31:56 ********************* HEAVY SWEEP COMPLETE *********************** Jul 18 14:31:56 632773 [41E02960] -> osm_sm_state_mgr_process: Received signal OSM_SM_SIGNAL_HANDOVER_SENT in state IB_SMINFO_STATE_MASTER### 14:31:56 ******************** ENTERING SM STANDBY STATE ******************* Subset of logs from higher priority SM during the cable restore of higher priority SM port: Jul 18 14:32:02 995600 [41401960] -> osm_sm_state_mgr_process: [ Jul 18 14:32:02 995605 [41401960] -> osm_sm_state_mgr_process: Received signal OSM_SM_SIGNAL_DISCOVERY_COMPLETED in state IB_SMINFO_STATE_DISCOVERING Jul 18 14:32:02 995609 [41401960] -> Entering MASTER state Jul 18 14:32:02 995888 [41401960] -> __osm_sm_state_mgr_master_msg: ******************** ENTERING SM MASTER STATE ******************** Jul 18 14:32:03 009014 [41401960] -> __osm_state_mgr_set_sm_lid_done_msg: **** SM LID ASSIGNMENT COMPLETE - STARTING SUBNET LID CONFIG ***** Jul 18 14:32:03 024047 [41E02960] -> __osm_state_mgr_lid_assign_msg ***** LID ASSIGNMENT COMPLETE - STARTING SWITCH TABLE CONFIG ***** Jul 18 14:32:03 024052 [41E02960] -> __osm_state_mgr_report_new_ports: [ ----> no in-service traps are generated and notices forwarded because there are no ports on this list Jul 18 14:32:03 024057 [41E02960] -> __osm_state_mgr_report_new_ports: ] Thanks! Lan
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
