On 14 February 2018 01:43:25 CET, Ryan Hsu <ryan...@qti.qualcomm.com> wrote:
>On 02/02/2018 07:11 AM, Toke Høiland-Jørgensen wrote:
>
>> Since we now have the convenient helper to do so, actually adjust the
>> TSQ pacing shift for packets going out over a WiFi interface. This
>> significantly improves throughput for locally-originated TCP
>> connections. The default pacing shift of 10 corresponds to ~1ms of
>> queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms)
>improves
>> 1-hop throughput for ath9k by a factor of 3, whereas increasing it
>more
>> has diminishing returns.
>>
>> Achieved throughput for different values of sk_pacing_shift (average
>of
>> 5 iterations of 10-sec netperf runs to a host on the other side of
>the
>> WiFi hop):
>>
>> sk_pacing_shift 10:  43.21 Mbps (pre-patch)
>> sk_pacing_shift  9:  78.17 Mbps
>> sk_pacing_shift  8: 123.94 Mbps
>> sk_pacing_shift  7: 128.31 Mbps
>>
>> Latency for competing flows increases from ~3 ms to ~10 ms with this
>> change. This is about the same magnitude of queueing latency induced
>by
>> flows that are not originated on the WiFi device itself (and so are
>not
>> limited by TSQ).
>>
>> Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
>> ---
>>  net/mac80211/tx.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>> index 25904af38839..69722504e3e1 100644
>> --- a/net/mac80211/tx.c
>> +++ b/net/mac80211/tx.c
>> @@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct
>sk_buff *skb,
>>      if (!IS_ERR_OR_NULL(sta)) {
>>              struct ieee80211_fast_tx *fast_tx;
>>  
>> +            /* We need a bit of data queued to build aggregates properly, so
>> +             * instruct the TCP stack to allow more than a single ms of data
>> +             * to be queued in the stack. The value is a bit-shift of 1
>> +             * second, so 8 is ~4ms of queued data. Only affects local TCP
>> +             * sockets.
>> +             */
>> +            sk_pacing_shift_update(skb->sk, 8);
>> +
>>              fast_tx = rcu_dereference(sta->fast_tx);
>>  
>>              if (fast_tx &&
>
>I knew increasing the value doesn't help much after 8 for ath9k, but I
>ran a
>testing on ath10k that 6 or 7 is having optimal number.
>Since ath10k/11ac device has higher bandwidth than ath9k/11n, can we
>consider
>to use to 6 or 7 to accommodate that effect?
>
>   tx (mbps) cpu usage (%)
>5    404       28.5
>6    398       13.8
>7    401        8
>8    378        5
>9    230        4.5
>10   79.6       2

Why does the CPU usage go up >7? Also, what is the latency impact of each of 
those values?

-Toke

Reply via email to