In addition to the SQM work ongoing... people keep improving the deployed TCPs... a multitude of improvements landed in Linux 3.14, notably these two caught my eye. usec resolution for tcp in 3.15, oy, vey!
I am planning on stabilizing my summer's testbed on 3.14 (and having a 3.4 box lying around for comparison)... unless any other cool patches arrive! I've seen a very similar problem (I think) on OSX, where the TCP cwnd stays stuck at 2*TSO on some kinds of paths where it shouldn't. I am under the impression that TCP small queue limits is not fully enabled by default due to some devices like wifi and non-bqled ethernet not working well. While the subsystems exist the driver writers are lagging... commit d10473d4e3f9d1b81b50a60c8465d6f59a095c46 Author: Eric Dumazet <[email protected]> Date: Sat Feb 22 22:25:57 2014 -0800 tcp: reduce the bloat caused by tcp_is_cwnd_limited() tcp_is_cwnd_limited() allows GSO/TSO enabled flows to increase their cwnd to allow a full size (64KB) TSO packet to be sent. Non GSO flows only allow an extra room of 3 MSS. For most flows with a BDP below 10 MSS, this results in a bloat of cwnd reaching 90, and an inflate of RTT. Thanks to TSO auto sizing, we can restrict the bloat to the number of MSS contained in a TSO packet (tp->xmit_size_goal_segs), to keep original intent without performance impact. Because we keep cwnd small, it helps to keep TSO packet size to their optimal value. Example for a 10Mbit flow, with low TCP Small queue limits (no more than 2 skb in qdisc/device tx ring) Before patch : lpk51:~# ./ss -i dst lpk52:44862 | grep cwnd cubic wscale:6,6 rto:215 rtt:15.875/2.5 mss:1448 cwnd:96 ssthresh:96 send 70.1Mbps unacked:14 rcv_space:29200 After patch : lpk51:~# ./ss -i dst lpk52:52916 | grep cwnd cubic wscale:6,6 rto:206 rtt:5.206/0.036 mss:1448 cwnd:15 ssthresh:14 send 33.4Mbps unacked:4 rcv_space:29200 commit 4a5ab4e224288403b0b4b6b8c4d339323150c312 Author: Eric Dumazet <[email protected]> Date: Thu Feb 6 15:57:10 2014 -0800 tcp: remove 1ms offset in srtt computation TCP pacing depends on an accurate srtt estimation. Current srtt estimation is using jiffie resolution, and has an artificial offset of at least 1 ms, which can produce slowdowns when FQ/pacing is used, especially in DC world, where typical rtt is below 1 ms. We are planning a switch to usec resolution for linux-3.15, but in the meantime, this patch removes the 1 ms offset. All we need is to have tp->srtt minimal value of 1 to differentiate the case of srtt being initialized or not, not 8. The problematic behavior was observed on a 40Gbit testbed, where 32 concurrent netperf were reaching 12Gbps of aggregate speed, instead of line speed. This patch also has the effect of reporting more accurate srtt and send rates to iproute2 ss command as in : $ ss -i dst cca2 Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 0 10.244.129.1:56984 10.244.129.2:12865 cubic wscale:6,6 rto:200 rtt:0.25/0.25 ato:40 mss:1448 cwnd:10 send 463.4Mbps rcv_rtt:1 rcv_space:29200 tcp ESTAB 0 390960 10.244.129.1:60247 10.244.129.2:50204 cubic wscale:6,6 rto:200 rtt:0.875/0.75 mss:1448 cwnd:73 ssthresh:51 send 966.4Mbps unacked:73 retrans:0/121 rcv_space:29200 -- Dave Täht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html _______________________________________________ Bloat mailing list [email protected] https://lists.bufferbloat.net/listinfo/bloat
