Yevgeny, > > OK, so there are three possible reasons that I can think of: > 1. Something is wrong in the configuration. > 2. The application does not saturate the link, thus QoS > and the whole VL arbitration thing doesn't kick in. > 3. There's some bug, somewhere. > > Let's start with reason no. 1. > Please shut off each of the SLs one by one, and > make sure that the application gets zero BW on > these SLs. You can do it by mapping SL to VL15: > > qos_sl2vl 0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15 If I shut down this SL by moving it to VL15, the interfaces stop pinging. This is probably because some IPoIB multicast traffic gets cut off for pkey 0x7fff .. ?
So no results for this one. > > and then > qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 > With this setup, and the following QoS settings: qos_max_vls 8 qos_high_limit 1 qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0 qos_vlarb_low 0:1,1:64,2:128,3:192,4:0,5:0 qos_sl2vl 0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15 I get roughly the same values for SL 1 to SL3: [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t 10 -P 8 2>&1; done | grep SUM [SUM] 0.0-10.0 sec 6.15 GBytes 5.28 Gbits/sec [SUM] 0.0-10.0 sec 6.00 GBytes 5.16 Gbits/sec [SUM] 0.0-10.1 sec 5.38 GBytes 4.59 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone -t 10 -P 8 2>&1; done | grep SUM [SUM] 0.0-10.0 sec 6.09 GBytes 5.23 Gbits/sec [SUM] 0.0-10.0 sec 6.41 GBytes 5.51 Gbits/sec [SUM] 0.0-10.0 sec 4.72 GBytes 4.05 Gbits/sec [r...@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t 10 -P 8 2>&1; done | grep SUM [SUM] 0.0-10.1 sec 6.96 GBytes 5.92 Gbits/sec [SUM] 0.0-10.1 sec 5.89 GBytes 5.00 Gbits/sec [SUM] 0.0-10.0 sec 5.35 GBytes 4.58 Gbits/sec > and then > qos_sl2vl 0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15 Same results as the previous 0,1,15,3,... SL2vl mapping. > > If this part works well, then we will continue to > reason no. 2. In the above tests, I used -P8 to force 8 threads on the client side for each test. I have one quad core CPU(Intel E55400). This makes 24 iperf threads on 4 cores, which __should__ be fine (well I suppose ...) And regarding reason #3. I still get the error I got yesterday, which you told me was not important because the SL's set in partitions.conf would override what was read from qos-policy.conf in the first place. Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS Level SL (3) Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS Level SL (2) Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS Level SL (1) Thanks for your help. Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
