On 16-12-2011 10:14, Alex Netes wrote:
> Hi Gerben,
> 
> It's complaining about the link rate:
> 
> Dec 15 23:35:05 792236 [46B9F940] 0x04 -> validate_port_caps: Port's RATE 2 
> is less than 3
> 
> Probably, the host that is trying to join is connected via 1x cable.
> The rate is defined by the capabilities of the host that opened a group, so
> you see this problem only when the host with higher rate created the MC group.

Is it possible to force them to some specified speed?

The strange thing is that both hosts show this problem if they start
opensm, they have the same errors in /var/log/opensm.log. This is what
both hosts have:

[root@titus ~]# lspci -v |grep Infini
0a:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0
5GT/s - IB DDR / 10GigE] (rev a0)

[root@vespasianus ~]# lspci -v |grep Infini
0a:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX VPI PCIe 2.0
5GT/s - IB DDR / 10GigE] (rev a0)

The hosts are connected to each other's single port via one IB cable.

[root@vespasianus ~]# grep -A1 -B1 INVALID /var/log/opensm.log| tail

Dec 16 11:35:10 041359 [483D2940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
from port 0x001e8c0000c84b62 (titus HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Dec 16 11:35:10 041365 [483D2940] 0x10 -> osm_sa_send_error: [
--
Dec 16 11:35:17 351591 [429C9940] 0x04 -> validate_port_caps: Port's
RATE 2 is less than 3
Dec 16 11:35:17 351598 [429C9940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
from port 0x001e8c0000b90641 (vespasianus HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Dec 16 11:35:17 351604 [429C9940] 0x10 -> osm_sa_send_error: [
--
Dec 16 11:35:18 042907 [43DCB940] 0x04 -> validate_port_caps: Port's
RATE 2 is less than 3
Dec 16 11:35:18 042914 [43DCB940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
from port 0x001e8c0000c84b62 (titus HCA-1), sending
IB_SA_MAD_STATUS_REQ_INVALID
Dec 16 11:35:18 042920 [43DCB940] 0x10 -> osm_sa_send_error: [

Gerben


> 
> On 09:56 Fri 16 Dec     , Gerben Roest wrote:
>> On 16-12-2011 1:06, Ira Weiny wrote:
>>> On Thu, 15 Dec 2011 15:17:24 -0800
>>> Gerben Roest <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Starting opensm from OFED 1.5.1, 1.5.3.2, 1.5.4 on a Scientific Linux 5
>>>> machine, directly linked to its neighbour (a twin 1U setup) gives me no
>>>> connection but lots of errors in /var/log/opensm.log, like these:
>>>>
>>>> Dec 15 22:38:35 685651 [45AFD940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
>>>> validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
>>>> from port 0x001e8c0000b90641 (vespasianus HCA-1), sending
>>>> IB_SA_MAD_STATUS_REQ_INVALID
>>>> Dec 15 22:38:35 686174 [464FE940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
>>>> validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
>>>> from port 0x001e8c0000c84b62 (titus HCA-1), sending
>>>> IB_SA_MAD_STATUS_REQ_INVALID
>>>>
>>>> Does anyone know what happens here? Another twin node has no problems,
>>>> that one uses OFED-1.5.1.
>>>>
>>>> I can send a "-V" log of opensm or any config files if you like,
>>>
>>> Just set -D 0x7 which adds VERBOSE and send the snippet around the above 
>>> errors.
>>
>> Dec 15 23:35:05 791001 [4399A940] 0x10 -> osm_vendor_send: [
>> Dec 15 23:35:05 791008 [4399A940] 0x04 -> osm_vendor_send: RMPP 0 length 256
>> Dec 15 23:35:05 791021 [4399A940] 0x10 -> osm_vendor_put: [
>> Dec 15 23:35:05 791028 [4399A940] 0x08 -> osm_vendor_put: Retiring UMAD
>> 0x3dd9290
>> Dec 15 23:35:05 791034 [4399A940] 0x10 -> osm_vendor_put: ]
>> Dec 15 23:35:05 791040 [4399A940] 0x08 -> osm_vendor_send: Completed
>> sending response or unsolicited p_madw = 0x3ddf5c0
>> Dec 15 23:35:05 791046 [4399A940] 0x10 -> osm_vendor_send: ]
>> Dec 15 23:35:05 791051 [4399A940] 0x10 -> osm_sa_send_error: ]
>> Dec 15 23:35:05 791057 [4399A940] 0x10 -> mcmr_rcv_join_mgrp: ]
>> Dec 15 23:35:05 791062 [4399A940] 0x10 -> osm_mcmr_rcv_process: ]
>> Dec 15 23:35:05 791068 [4399A940] 0x10 -> sa_mad_ctrl_disp_done_callback: [
>> Dec 15 23:35:05 791073 [4399A940] 0x10 -> osm_vendor_put: [
>> Dec 15 23:35:05 791079 [4399A940] 0x08 -> osm_vendor_put: Retiring UMAD
>> 0x3dd7290
>> Dec 15 23:35:05 791084 [4399A940] 0x10 -> osm_vendor_put: ]
>> Dec 15 23:35:05 791090 [4399A940] 0x10 -> sa_mad_ctrl_disp_done_callback: ]
>> Dec 15 23:35:05 792086 [4B1A6940] 0x10 -> osm_vendor_get: [
>> Dec 15 23:35:05 792106 [4B1A6940] 0x08 -> osm_vendor_get: Acquiring UMAD
>> for p_madw = 0x3ddf5d8, size = 256
>> Dec 15 23:35:05 792117 [4B1A6940] 0x08 -> osm_vendor_get: Acquired UMAD
>> 0x3dd7290, size = 256
>> Dec 15 23:35:05 792126 [4B1A6940] 0x10 -> osm_vendor_get: ]
>> Dec 15 23:35:05 792132 [4B1A6940] 0x10 -> sa_mad_ctrl_rcv_callback: [
>> Dec 15 23:35:05 792139 [4B1A6940] 0x08 -> sa_mad_ctrl_rcv_callback: 4 SA
>> MADs received
>> Dec 15 23:35:05 792152 [4B1A6940] 0x20 -> SA MAD dump:
>>                                 base_ver................0x1
>>                                 mgmt_class..............0x3
>>                                 class_ver...............0x2
>>                                 method..................0x2 (SubnAdmSet)
>>                                 status..................0x0
>>                                 resv....................0x0
>>                                 trans_id................0x53bf6d21e
>>                                 attr_id.................0x38
>> (MCMemberRecord)
>>                                 resv1...................0x0
>>                                 attr_mod................0x0
>>                                 rmpp_version............0x0
>>                                 rmpp_type...............0x0
>>                                 rmpp_flags..............0x0
>>                                 rmpp_status.............0x0
>>                                 seg_num.................0x0
>>                                 payload_len/new_win.....0x0
>>                                 sm_key..................0x0000000000000000
>>                                 attr_offset.............0x0
>>                                 resv2...................0x0
>>                                 comp_mask...............0x0000000000010083
>>
>>
>> Dec 15 23:35:05 792158 [4B1A6940] 0x10 -> sa_mad_ctrl_process: [
>> Dec 15 23:35:05 792165 [4B1A6940] 0x08 -> sa_mad_ctrl_process: Posting
>> Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD
>> Dec 15 23:35:05 792187 [4B1A6940] 0x10 -> sa_mad_ctrl_process: ]
>> Dec 15 23:35:05 792194 [4B1A6940] 0x10 -> sa_mad_ctrl_rcv_callback: ]
>> Dec 15 23:35:05 792204 [46B9F940] 0x10 -> osm_mcmr_rcv_process: [
>> Dec 15 23:35:05 792211 [46B9F940] 0x10 -> mcmr_rcv_join_mgrp: [
>> Dec 15 23:35:05 792216 [46B9F940] 0x08 -> mcmr_rcv_join_mgrp: Dump of
>> incoming record
>> Dec 15 23:35:05 792228 [46B9F940] 0x08 -> MCMember Record dump:
>>
>> MGID....................ff12:401b:ffff::ffff:ffff
>>                                 PortGid.................fe80::1e:8c00:b9:641
>>                                 qkey....................0x0
>>                                 mlid....................0x0
>>                                 mtu.....................0x0
>>                                 TClass..................0x0
>>                                 pkey....................0xFFFF
>>                                 rate....................0x0
>>                                 pkt_life................0x0
>>                                 SLFlowLabelHopLimit.....0x0
>>                                 ScopeState..............0x1
>>                                 ProxyJoin...............0x0
>> Dec 15 23:35:05 792236 [46B9F940] 0x04 -> validate_port_caps: Port's
>> RATE 2 is less than 3
>> Dec 15 23:35:05 792243 [46B9F940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B12:
>> validate_more_comp_fields, validate_port_caps, or JoinState = 0 failed
>> from port 0x001e8c0000b90641 (vespasianus HCA-1), sending
>> IB_SA_MAD_STATUS_REQ_INVALID
>> Dec 15 23:35:05 792253 [46B9F940] 0x10 -> osm_sa_send_error: [
>> Dec 15 23:35:05 792260 [46B9F940] 0x10 -> osm_vendor_get: [
>> Dec 15 23:35:05 792266 [46B9F940] 0x08 -> osm_vendor_get: Acquiring UMAD
>> for p_madw = 0x3dd73f8, size = 256
>> Dec 15 23:35:05 792273 [46B9F940] 0x08 -> osm_vendor_get: Acquired UMAD
>> 0x3dd9290, size = 256
>> Dec 15 23:35:05 792279 [46B9F940] 0x10 -> osm_vendor_get: ]
>> Dec 15 23:35:05 792291 [46B9F940] 0x20 -> SA MAD dump:
>>                                 base_ver................0x1
>>                                 mgmt_class..............0x3
>>                                 class_ver...............0x2
>>                                 method..................0x81
>> (SubnAdmGetResp)
>>                                 status..................0x200
>>                                 resv....................0x0
>>                                 trans_id................0x53bf6d21e
>>                                 attr_id.................0x38
>> (MCMemberRecord)
>>                                 resv1...................0x0
>>                                 attr_mod................0x0
>>                                 rmpp_version............0x0
>>                                 rmpp_type...............0x0
>>                                 rmpp_flags..............0x0
>>                                 rmpp_status.............0x0
>>                                 seg_num.................0x0
>>                                 payload_len/new_win.....0x0
>>                                 sm_key..................0x0000000000000000
>>                                 attr_offset.............0x0
>>                                 resv2...................0x0
>>                                 comp_mask...............0x0000000000010083
>>
>>
>> Dec 15 23:35:05 792298 [46B9F940] 0x10 -> osm_vendor_send: [
>> Dec 15 23:35:05 792304 [46B9F940] 0x04 -> osm_vendor_send: RMPP 0 length 256
>> Dec 15 23:35:05 792318 [46B9F940] 0x10 -> osm_vendor_put: [
>> Dec 15 23:35:05 792325 [46B9F940] 0x08 -> osm_vendor_put: Retiring UMAD
>> 0x3dd9290
>> Dec 15 23:35:05 792331 [46B9F940] 0x10 -> osm_vendor_put: ]
>> Dec 15 23:35:05 792337 [46B9F940] 0x08 -> osm_vendor_send: Completed
>> sending response or unsolicited p_madw = 0x3dd73e0
>> Dec 15 23:35:05 792343 [46B9F940] 0x10 -> osm_vendor_send: ]
>> Dec 15 23:35:05 792360 [46B9F940] 0x10 -> osm_sa_send_error: ]
>> Dec 15 23:35:05 792366 [46B9F940] 0x10 -> mcmr_rcv_join_mgrp: ]
>> Dec 15 23:35:05 792371 [46B9F940] 0x10 -> osm_mcmr_rcv_process: ]
>> Dec 15 23:35:05 792377 [46B9F940] 0x10 -> sa_mad_ctrl_disp_done_callback: [
>> Dec 15 23:35:05 792383 [46B9F940] 0x10 -> osm_vendor_put: [
>> Dec 15 23:35:05 792388 [46B9F940] 0x08 -> osm_vendor_put: Retiring UMAD
>> 0x3dd7e40
>> Dec 15 23:35:05 792394 [46B9F940] 0x10 -> osm_vendor_put: ]
>> Dec 15 23:35:05 792400 [46B9F940] 0x10 -> sa_mad_ctrl_disp_done_callback: ]
>> Dec 15 23:35:09 759207 [4A7A5940] 0x08 -> sm_sweeper: Off schedule sweep
>> signalled
>> Dec 15 23:35:09 759229 [4A7A5940] 0x10 -> osm_state_mgr_process: [
>> Dec 15 23:35:09 759240 [4A7A5940] 0x08 -> osm_state_mgr_process:
>> Received signal OSM_SIGNAL_SWEEP in state MASTER
>> Dec 15 23:35:09 759249 [4A7A5940] 0x10 -> state_mgr_sweep_hop_0: [
>> Dec 15 23:35:09 759258 [4A7A5940] 0x04 -> state_mgr_sweep_hop_0:
>>
>>
>>
>> thanks,
>>
>> Gerben
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 

Grep IT                      tel: 0252-769005
Egelantier 3                 fax: 0252-769006
2211 NN Noordwijkerhout     [email protected]
The Netherlands
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to