Hi all:
I had one IB cluster with eight IBM HS21 blades, mixed with RHEL5.2 Server
and SLES10 SP2. All of them connected to one IB switch. opensm was running
as subnet manager on one blade. Command ibcheckerrors finished smoothly.
Last week I got another eight IBM LS21 blades connected to another IB
switch. But after I connected two switches and turned on all the IB
adapters on new blades, ibcheckerrors gave error message:
[EMAIL PROTECTED] ~]# ibcheckerrors
#warn: counter RcvErrors = 5691 (threshold 10) lid 3 port 1
Error check on lid 3 (gaia-07 HCA-1) port 1: FAILED
## Summary: 19 nodes checked, 0 bad nodes found
## 46 ports checked, 1 ports have errors beyond threshold
[EMAIL PROTECTED] ~]# ibv_devinfo
hca_id: mlx4_0
fw_ver: 2.3.000
node_guid: 0002:c903:0001:3370
sys_image_guid: 0002:c903:0001:3373
vendor_id: 0x02c9
vendor_part_id: 25418
hw_ver: 0xA0
board_id: IBM08A0000001
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 15
port_lid: 3
port_lmc: 0x00
port: 2
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
[EMAIL PROTECTED] ~]# ibcheckport 3 1
[EMAIL PROTECTED] ~]# echo $?
0
I had closed the embeded subnet manager on two IB switches. The issue
always exist, even after I change subnet manager location to another
machine. ib0 of machine gaia-07 can communicate with other machines each
other. All installed IB adapters are ConnectX 4xSDR. Both switches are
Topspin Switches. Will anyone give some advice about this issue? Thanks in
advance!
Wen Hao Wang
Email: [EMAIL PROTECTED]_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general