On 5/18/2012 10:35 AM, Bob Ciotti wrote:
> On 05/18/2012 06:07 AM, Hal Rosenstock wrote:
>> On 5/18/2012 2:05 AM, Bob Ciotti wrote:
>>>
>>>
>>> I'm seeing lots of these messages in SM log:
>>>
>>> May 17 22:36:04 947774 [DA234710] 0x01 ->  log_trap_info: Received
>>> Generic Notice type:1 num:131 (Flow Control Update watchdog timer
>>> expired) Producer:2 (Switch) from LID:444 Port 5 TID:0x0000000000000025
>>>
>>> the referenced port is a switch to HCA link.
>>>
>>> I've seen this in cases where there was bad hardware. Spec says failure
>>> in flow control machine on other end. But lets assume hardware was good.
>>> When could this occur?
>>
>> Do OperationalVLs match on both sides of the link ? Are you
>> using/configuring QoS ?
>>
> 
> 
> There are two separate fabric on each port of 2 port HCA.
> Issue is seen on both fabrics.

So these are dual homed hcas onto disjoint IB subnets.

> Normally we use QoS on both fabrics. QoS now disabled on
> ib0 on hca port 1:

Is watchdog timeout still observed on fabric to which hca for port 1 is
attached ?

> 
> r327i7n0 ~ # smpquery portinfo 248 | grep VL
> VLCap:...........................VL0-7
> VLHighLimit:.....................4
> VLArbHighCap:....................8
> VLArbLowCap:.....................8
> VLStallCount:....................0
> OperVLs:.........................VL0-7
> r327i7n0 ~ # smpquery -D portinfo 0 1 | grep VL
> VLCap:...........................VL0-7
> VLHighLimit:.....................4
> VLArbHighCap:....................8
> VLArbLowCap:.....................8
> VLStallCount:....................0
> OperVLs:.........................VL0-7
> r327i7n0 ~ # smpquery -D portinfo 0,1 1 | grep VL
> VLCap:...........................VL0-7
> VLHighLimit:.....................4
> VLArbHighCap:....................8
> VLArbLowCap:.....................8
> VLStallCount:....................7
> OperVLs:.........................VL0-7

It's not an OperVLs mismatch issue.

> r327i7n0 ~ # ibstat
> CA 'mlx4_0'
>     CA type: MT4099
>     Number of ports: 2
>     Firmware version: 2.10.4350
>     Hardware version: 0
>     Node GUID: 0x0002c90300336b20
>     System image GUID: 0x0002c90300336b23
>     Port 1:
>         State: Active
>         Physical state: LinkUp
>         Rate: 56
>         Base lid: 248
>         LMC: 0
>         SM lid: 1
>         Capability mask: 0x02514868
>         Port GUID: 0x0002c90300336b21
>         Link layer: InfiniBand
>     Port 2:
>         State: Active
>         Physical state: LinkUp
>         Rate: 56
>         Base lid: 1971
>         LMC: 0
>         SM lid: 1685
>         Capability mask: 0x02514868
>         Port GUID: 0x0002c90300336b22
>         Link layer: InfiniBand
> 
> r327i7n0 ~ # smpquery -D nodeinfo 0,1 1
> # Node info: DR path slid 65535; dlid 65535; 0,1
> BaseVers:........................1
> ClassVers:.......................1
> NodeType:........................Switch
> NumPorts:........................36
> SystemGuid:......................0x080069000000a4db
> Guid:............................0x080069000000a4d8
> PortGuid:........................0x080069000000a4d8
> PartCap:.........................8
> DevId:...........................0xc738
> Revision:........................0x000000a1
> LocalPort:.......................1
> VendorId:........................0x0002c9
> 
> r327i7n0 ~ # smpquery -D nodedesc 0,1
> Node Description:.SwitchX -  Mellanox Technologies

What does vendstat -N to this switch say ? Do you know what firmware is
running there ?

-- Hal

> r327i7n0 ~ # smpquery -D sl2vl 0,1 1
> # SL2VL table: DR path slid 65535; dlid 65535; 0,1
> #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> ports: in  0, out  1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
> ports: in  1, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in  2, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in  3, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in  4, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in  5, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in  6, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in  7, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in  8, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in  9, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 10, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 11, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 12, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 13, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 14, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 15, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 16, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 17, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 18, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 19, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 20, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 21, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 22, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 23, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 24, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 25, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 26, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 27, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 28, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 29, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 30, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 31, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 32, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 33, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 34, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 35, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> ports: in 36, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 0| 1| 2| 3| 4| 5| 6| 7|
> 
> r327i7n0 ~ # smpquery -D sl2vl 0 1
> # SL2VL table: DR path slid 65535; dlid 65535; 0
> #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> ports: in  0, out  0: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
> 
> r327i7n0 ~ # smpquery -D vlarb 0,1 1
> # VLArbitration tables: DR path slid 65535; dlid 65535; 0,1 port 1
> LowCap 8 HighCap 8
> # Low priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
> # High priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> WEIGHT: |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |0x1 |
> 
> r327i7n0 ~ # smpquery -D vlarb 0 1
> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 1 LowCap
> 8 HighCap 8
> # Low priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> WEIGHT: |0x20|0x20|0x20|0x20|0x20|0x20|0x20|0x20|
> # High priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
> 
> 
> on ib1, HCA port 2, Qos is enabled:
> 
> r327i7n0 ~ # smpquery -P2 -D sl2vl 0 2
> # SL2VL table: DR path slid 65535; dlid 65535; 0
> #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> ports: in  0, out  0: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> 
> r327i7n0 ~ # smpquery -P2 -D sl2vl 0,2 1
> # SL2VL table: DR path slid 65535; dlid 65535; 0,2
> #                 SL: | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|13|14|15|
> ports: in  0, out  1: | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0|
> ports: in  1, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in  2, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in  3, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in  4, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in  5, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in  6, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in  7, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in  8, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in  9, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 10, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 11, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 12, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 13, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 14, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 15, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 16, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 17, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 18, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 19, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 20, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 21, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 22, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 23, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 24, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 25, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 26, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 27, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 28, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 29, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 30, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 31, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 32, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 33, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 34, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 35, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> ports: in 36, out  1: | 0| 1| 2| 3| 4| 5| 6| 7| 3| 4| 5| 6| 7| 3| 4| 5|
> 
> r327i7n0 ~ # smpquery -P2 -D vlarb 0,2 1
> # VLArbitration tables: DR path slid 65535; dlid 65535; 0,2 port 1
> LowCap 8 HighCap 8
> # Low priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> WEIGHT: |0x0 |0x0 |0x0 |0x40|0x40|0x40|0x40|0x40|
> # High priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x0 |0x0 |0x0 |0x0 |0x0 |
> WEIGHT: |0x80|0x40|0x40|0x0 |0x0 |0x0 |0x0 |0x0 |
> 
> r327i7n0 ~ # smpquery -P2 -D vlarb 0 2
> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 2 LowCap
> 8 HighCap 8
> # Low priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x6 |0x7 |
> WEIGHT: |0x0 |0x0 |0x0 |0x40|0x40|0x40|0x40|0x40|
> # High priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x0 |0x0 |0x0 |0x0 |0x0 |
> WEIGHT: |0x80|0x40|0x40|0x0 |0x0 |0x0 |0x0 |0x0 |
> 
>>> Only in the case of FW bug?
>>
>> I don't think flow control is performed by FW.
>>
>>> Any tunable's that might impact this?
>>
>> No IBA standard ones AFAIK. Who's the HCA vendor ?
>>
>> -- Hal
>>
>>> bob
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to [email protected]
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to