On 02/02/2018 07:11 AM, Toke Høiland-Jørgensen wrote:

> Since we now have the convenient helper to do so, actually adjust the
> TSQ pacing shift for packets going out over a WiFi interface. This
> significantly improves throughput for locally-originated TCP
> connections. The default pacing shift of 10 corresponds to ~1ms of
> queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
> 1-hop throughput for ath9k by a factor of 3, whereas increasing it more
> has diminishing returns.
>
> Achieved throughput for different values of sk_pacing_shift (average of
> 5 iterations of 10-sec netperf runs to a host on the other side of the
> WiFi hop):
>
> sk_pacing_shift 10:  43.21 Mbps (pre-patch)
> sk_pacing_shift  9:  78.17 Mbps
> sk_pacing_shift  8: 123.94 Mbps
> sk_pacing_shift  7: 128.31 Mbps
>
> Latency for competing flows increases from ~3 ms to ~10 ms with this
> change. This is about the same magnitude of queueing latency induced by
> flows that are not originated on the WiFi device itself (and so are not
> limited by TSQ).
>
> Signed-off-by: Toke Høiland-Jørgensen <t...@toke.dk>
> ---
>  net/mac80211/tx.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 25904af38839..69722504e3e1 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
>       if (!IS_ERR_OR_NULL(sta)) {
>               struct ieee80211_fast_tx *fast_tx;
>  
> +             /* We need a bit of data queued to build aggregates properly, so
> +              * instruct the TCP stack to allow more than a single ms of data
> +              * to be queued in the stack. The value is a bit-shift of 1
> +              * second, so 8 is ~4ms of queued data. Only affects local TCP
> +              * sockets.
> +              */
> +             sk_pacing_shift_update(skb->sk, 8);
> +
>               fast_tx = rcu_dereference(sta->fast_tx);
>  
>               if (fast_tx &&

I knew increasing the value doesn't help much after 8 for ath9k, but I ran a
testing on ath10k that 6 or 7 is having optimal number.
Since ath10k/11ac device has higher bandwidth than ath9k/11n, can we consider
to use to 6 or 7 to accommodate that effect?

   tx (mbps) cpu usage (%)
5    404       28.5
6    398       13.8
7    401        8
8    378        5
9    230        4.5
10   79.6       2

I have a quad core machine.

$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 58
model name      : Intel(R) Core(TM) i5-3380M CPU @ 2.90GHz

-- 
Ryan Hsu

Reply via email to