Seems to be a limitation in the application. We used scp, and it (still) seems to limit the bytes in flight. Using our own application, we didn't see a limit indeed. Thanks for your response, and sorry for the noise...
Koen. > -----Original Message----- > From: Yuchung Cheng [mailto:ych...@google.com] > Sent: dinsdag 8 november 2016 5:51 > To: De Schepper, Koen (Nokia - BE) <koen.de_schepper@nokia-bell- > labs.com> > Cc: netdev@vger.kernel.org > Subject: Re: Is there a maximum bytes in flight limitation in the tcp stack? > > On Thu, Nov 3, 2016 at 9:37 AM, De Schepper, Koen (Nokia - BE) > <koen.de_schep...@nokia-bell-labs.com> wrote: > > > > Hi, > > > > We experience some limit on the maximum packets in flight which seem > not to be related with the receive or write buffers. Does somebody know if > there is an issue with a maximum of around 1MByte (or sometimes 2Mbyte) > of data in flight per TCP flow? > > does not ring a bell. I've definitely see cubic reaching >2MB cwnd (inflight) > some packet trace will help. > > btw, tcp_rmem is the maximum receive buffer including all header and > control overhead. the receive window announced is (very roughly) half > of your rcvbuf. > > > > > It seems to be a strict and stable limit independent from the CC (tested > with Cubic, Reno and DCTCP). On a link of 200Mbps and 200ms RTT our link is > only 20% (sometimes 40%, see conditions below) utilized for a single TCP > flow with no drop experienced at all (no bottleneck in the AQM or RTT > emulation, as it supports more throughput if multiple flows are active). > > > > Some configuration changes we already tried on both client and server > (kernel 3.18.9): > > > > net.ipv4.tcp_no_metrics_save = 1 > > net.ipv4.tcp_rmem = 4096 87380 6291456 > > net.ipv4.tcp_wmem = 4096 16384 4194304 > > > > SERVER# ss -i > > tcp ESTAB 0 1049728 10.187.255.211:46642 10.187.16.194:ssh > > dctcp wscale:7,7 rto:408 rtt:204.333/0.741 ato:40 mss:1448 > > cwnd:1466 > send 83.1Mbps unacked:728 rcv_rtt:212 rcv_space:29200 > > CLIENT# ss -i > > tcp ESTAB 0 288 10.187.16.194:ssh > > 10.187.255.211:46642 > > dctcp wscale:7,7 rto:404 rtt:203.389/0.213 ato:40 mss:1448 cwnd:78 > send 4.4Mbps unacked:8 rcv_rtt:204 rcv_space:1074844 > > > > When increasing the write and receive mem further (they were already > way above 1 or 2 MB) it steps to double (40%; 2Mbytes in flight): > > net.ipv4.tcp_no_metrics_save = 1 > > net.ipv4.tcp_rmem = 4096 8000000 16291456 > > net.ipv4.tcp_wmem = 4096 8000000 16291456 > > > > SERVER # ss -i > > tcp ESTAB 0 2068976 10.187.255.212:54637 10.187.16.112:ssh > > cubic wscale:8,8 rto:404 rtt:202.622/0.061 ato:40 mss:1448 > > cwnd:1849 > ssthresh:1140 send 105.7Mbps unacked:1457 rcv_rtt:217.5 rcv_space:29200 > > CLIENT# ss -i > > tcp ESTAB 0 648 10.187.16.112:ssh > > 10.187.255.212:54637 > > cubic wscale:8,8 rto:404 rtt:201.956/0.038 ato:40 mss:1448 cwnd:132 > send 7.6Mbps unacked:18 rcv_rtt:204 rcv_space:2093044 > > > > Further increasing (x10) does not help anymore... > > net.ipv4.tcp_no_metrics_save = 1 > > net.ipv4.tcp_rmem = 4096 80000000 162914560 > > net.ipv4.tcp_wmem = 4096 80000000 162914560 > > > > As all these parameters autotune, it is hard to find out which one is > limiting... In the examples, above unacked does not want to go higher, while > congestion window in the server is big enough... rcv_space could be limiting, > but it tunes up if I change the server with the higher buffers (switching to > 2MByte in flight). > > > > We also tried tcp_limit_output_bytes, setting it bigger (x10) and > smaller(/10), without effect. We've put it in /etc/sysctl.conf and rebooted, > to > make sure that it is effective. > > > > Some more detailed tests that had an effect on the 1 or 2MByte: > > - It seems that with TSO off, if we configure a bigger wmem buffer, an > ongoing flow suddenly is able to immediately double its bytes in flight limit. > We configured further up to more than 10x the buffer, but no further > increase helps, and the limits we saw are only 1MByte and 2Mbyte (no > intermediate values depending on any parameter). When setting tcp_wmem > smaller again, the 2MByte limit stays on the ongoing flow. We have to restart > the flow to make the buffer reduction to 1MByte effective. > > - With TSO on, only the 2MByte limit is effective, independent from the > wmem buffer. We have to restart the flow to make a tso change effective. > > > > Koen. > >