Hi Todd,

Todd Bowman wrote:
OpenSM Rev:openib-3.0.13

Can you upgrade to OFED 1.3.1?
We had some bug that was causing opensm to drop the wrong transactions,
and the errors in your log could be caused by that. The bug was fixed
in OFED 1.3

-- Yevgeny

The opensm segfaulted during an initialization that seems to have been the result of a link state trap (type 1 num12)


09:49:51 914967 [41001960] -> __osm_trap_rcv_process_
request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x011A TID:0x00000000000016cc 09:49:51 948014 [41001960] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x011A GID:0xfe80000000000000,0x0008f104003f0ab5 09:49:51 948477 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:67 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad 09:49:51 948497 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad 09:49:51 948502 [41802960] -> __osm_drop_mgr_remove_port: Removed port with GUID:0x0002c90200207801 LID range [0x89,0x89] of node:n1008 09:49:51 948519 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:67 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad 09:49:51 948529 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad
...
...
...

09:49:51 962126 [41802960] -> __osm_drop_mgr_remove_port: Removed port with GUID:0x0002c902002064ad LID range [0xFD,0xFD] of node:hn HCA-1 09:49:52 044097 [41802960] -> __osm_lid_mgr_process_our_sm_node: ERR 0308: Can't acquire SM's port object, GUID 0x0002c902002064ad 09:49:52 098558 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_SUBNET_UCAST_LIDS_WAIT 09:49:52 098917 [41001960] -> __osm_state_mgr_check_tbl_consistency: ERR 3322: lid 0x6E is wrongly assigned to port 0x0008f104003f2cdb in port_lid_tbl 09:49:52 098936 [41001960] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x00FD GID:0xfe80000000000000,0x0002c902002064ad 09:49:52 098944 [41001960] -> __osm_state_mgr_report_new_ports: Discovered new port with GUID:0x0008f104003f2cdb LID range [0x0,0x0] of node:ISR9288/ISR9096 Voltaire sLB-24 09:49:52 098957 [41001960] -> osm_ucast_mgr_process: null (min-hop) tables configured on all switches 09:49:52 098992 [41001960] -> __osm_ucast_mgr_process_port: ERR 3A04: Port 0x8f104003f2cdb has LID 0. An initialization error occurred. Ignoring port 09:49:52 103405 [41802960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_LINK_PORTS_WAIT 09:49:52 103626 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_LINK_PORTS_WAIT 09:49:52 103856 [41001960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_LINK_PORTS_WAIT 09:49:52 104077 [41802960] -> __osm_state_mgr_signal_error: ERR 3303: Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state OSM_SM_STATE_SET_LINK_PORTS_WAIT
...
...
...


1) Why does the link down trap, start the long chain of __osm_drop_mgr_remove_port?

2) Which of the errors may have caused the the segfault?



Thanks,
Todd


------------------------------------------------------------------------

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to