Hi Todd,
Todd Bowman wrote:
OpenSM Rev:openib-3.0.13
Can you upgrade to OFED 1.3.1?
We had some bug that was causing opensm to drop the wrong transactions,
and the errors in your log could be caused by that. The bug was fixed
in OFED 1.3
-- Yevgeny
The opensm segfaulted during an initialization that seems to have been
the result of a link state trap (type 1 num12)
09:49:51 914967 [41001960] -> __osm_trap_rcv_process_
request: Received Generic Notice type:0x01 num:128 Producer:2 from
LID:0x011A TID:0x00000000000016cc
09:49:51 948014 [41001960] -> osm_report_notice: Reporting Generic
Notice type:1 num:128 from LID:0x011A
GID:0xfe80000000000000,0x0008f104003f0ab5
09:49:51 948477 [41802960] -> osm_report_notice: Reporting Generic
Notice type:3 num:67 from LID:0x00FD
GID:0xfe80000000000000,0x0002c902002064ad
09:49:51 948497 [41802960] -> osm_report_notice: Reporting Generic
Notice type:3 num:65 from LID:0x00FD
GID:0xfe80000000000000,0x0002c902002064ad
09:49:51 948502 [41802960] -> __osm_drop_mgr_remove_port: Removed port
with GUID:0x0002c90200207801 LID range [0x89,0x89] of node:n1008
09:49:51 948519 [41802960] -> osm_report_notice: Reporting Generic
Notice type:3 num:67 from LID:0x00FD
GID:0xfe80000000000000,0x0002c902002064ad
09:49:51 948529 [41802960] -> osm_report_notice: Reporting Generic
Notice type:3 num:65 from LID:0x00FD
GID:0xfe80000000000000,0x0002c902002064ad
...
...
...
09:49:51 962126 [41802960] -> __osm_drop_mgr_remove_port: Removed port
with GUID:0x0002c902002064ad LID range [0xFD,0xFD] of node:hn HCA-1
09:49:52 044097 [41802960] -> __osm_lid_mgr_process_our_sm_node: ERR
0308: Can't acquire SM's port object, GUID 0x0002c902002064ad
09:49:52 098558 [41001960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_SUBNET_UCAST_LIDS_WAIT
09:49:52 098917 [41001960] -> __osm_state_mgr_check_tbl_consistency: ERR
3322: lid 0x6E is wrongly assigned to port 0x0008f104003f2cdb in
port_lid_tbl
09:49:52 098936 [41001960] -> osm_report_notice: Reporting Generic
Notice type:3 num:64 from LID:0x00FD
GID:0xfe80000000000000,0x0002c902002064ad
09:49:52 098944 [41001960] -> __osm_state_mgr_report_new_ports:
Discovered new port with GUID:0x0008f104003f2cdb LID range [0x0,0x0] of
node:ISR9288/ISR9096 Voltaire sLB-24
09:49:52 098957 [41001960] -> osm_ucast_mgr_process: null (min-hop)
tables configured on all switches
09:49:52 098992 [41001960] -> __osm_ucast_mgr_process_port: ERR 3A04:
Port 0x8f104003f2cdb has LID 0. An initialization error occurred.
Ignoring port
09:49:52 103405 [41802960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_LINK_PORTS_WAIT
09:49:52 103626 [41001960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_LINK_PORTS_WAIT
09:49:52 103856 [41001960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_LINK_PORTS_WAIT
09:49:52 104077 [41802960] -> __osm_state_mgr_signal_error: ERR 3303:
Invalid signal OSM_SIGNAL_CHANGE_DETECTED(2) in state
OSM_SM_STATE_SET_LINK_PORTS_WAIT
...
...
...
1) Why does the link down trap, start the long chain of
__osm_drop_mgr_remove_port?
2) Which of the errors may have caused the the segfault?
Thanks,
Todd
------------------------------------------------------------------------
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general