Yevgeny,
>
> OK, so there are three possible reasons that I can think of:
> 1. Something is wrong in the configuration.
> 2. The application does not saturate the link, thus QoS
>   and the whole VL arbitration thing doesn't kick in.
> 3. There's some bug, somewhere.
>
> Let's start with reason no. 1.
> Please shut off each of the SLs one by one, and
> make sure that the application gets zero BW on
> these SLs. You can do it by mapping SL to VL15:
>
> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
If I shut down this SL by moving it to VL15, the interfaces stop pinging.
This is probably because some IPoIB multicast traffic gets cut off for
pkey 0x7fff .. ?

So no results for this one.
>
> and then
> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>
With this setup, and the following QoS settings:

qos_max_vls    8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

I get roughly the same values for SL 1 to SL3:

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
[SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
[SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
-t 10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
[SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
[SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec

[r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
[SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
[SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec

> and then
> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
Same results as the previous 0,1,15,3,... SL2vl mapping.
>
> If this part works well, then we will continue to
> reason no. 2.
In the above tests, I used -P8 to force 8 threads on the client side for
each test.
I have one quad core CPU(Intel  E55400).
This makes 24 iperf threads on 4 cores, which __should__ be fine (well I
suppose ...)

And regarding reason #3. I still get the error I got yesterday, which
you told me was not important because the SL's set in partitions.conf
would override what was read from qos-policy.conf in the first place.

Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
Level SL (3)
Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
Level SL (2)
Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
Level SL (1)

Thanks for your help.

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to