The thing you call "pacing" is something quite different. It is disconnected
from the TCP control loops involved, which basically means it is flying blind.
Introducing that kind of "pacing" almost certainly reduces throughput, because
it *delays* packets.
The thing I called "pacing" is in no version of Linux that I know of. Give it
a different name: "anti-bunching cooperation" or "timing phase management for
congestion reduction". Rather than *delaying* packets, it tries to get packets
to avoid bunching only when reducing window size, and doing so by tightening
the control loop so that the sender transmits as *soon* as it can, not by
delaying sending after the sender dallies around not sending when it can.
On Tuesday, May 27, 2014 11:23am, "Jim Gettys" <[email protected]> said:
On Sun, May 25, 2014 at 4:00 PM, <[[email protected]](mailto:[email protected])>
wrote:
Not that it is directly relevant, but there is no essential reason to require
50 ms. of buffering. That might be true of some particular QOS-related router
algorithm. 50 ms. is about all one can tolerate in any router between source
and destination for today's networks - an upper-bound rather than a minimum.
The optimum buffer state for throughput is 1-2 packets worth - in other words,
if we have an MTU of 1500, 1500 - 3000 bytes. Only the bottleneck buffer (the
input queue to the lowest speed link along the path) should have this much
actually buffered. Buffering more than this increases end-to-end latency beyond
its optimal state. Increased end-to-end latency reduces the effectiveness of
control loops, creating more congestion.
The rationale for having 50 ms. of buffering is probably to avoid disruption of bursty mixed flows
where the bursts might persist for 50 ms. and then die. One reason for this is that source nodes
run operating systems that tend to release packets in bursts. That's a whole other discussion - in
an ideal world, source nodes would avoid bursty packet releases by letting the control by the
receiver window be "tight" timing-wise. That is, to transmit a packet immediately at the
instant an ACK arrives increasing the window. This would pace the flow - current OS's tend (due to
scheduling mismatches) to send bursts of packets, "catching up" on sending that could
have been spaced out and done earlier if the feedback from the receiver's window advancing were
heeded.
That is, endpoint network stacks (TCP implementations) can worsen congestion by
"dallying". The ideal end-to-end flows occupying a congested router would have their
packets paced so that the packets end up being sent in the least bursty manner that an application
can support. The effect of this pacing is to move the "backlog" for each flow quickly
into the source node for that flow, which then provides back pressure on the application driving
the flow, which ultimately is necessary to stanch congestion. The ideal congestion control
mechanism slows the sender part of the application to a pace that can go through the network
without contributing to buffering.
Pacing is in Linux 3.12(?). How long it will take to see widespread
deployment is another question, and as for other operating systems, who knows.
See: [https://lwn.net/Articles/564978/](https://lwn.net/Articles/564978/)
Current network stacks (including Linux's) don't achieve that goal - their
pushback on application sources is minimal - instead they accumulate buffering
internal to the network implementation.
This is much, much less true than it once was. There have been substantial
changes in the Linux TCP stack in the last year or two, to avoid generating
packets before necessary. Again, how long it will take for people to deploy
this on Linux (and implement on other OS's) is a question.
This contributes to end-to-end latency as well. But if you think about it, this is
almost as bad as switch-level bufferbloat in terms of degrading user experience. The
reason I say "almost" is that there are tools, rarely used in practice, that
allow an application to specify that buffering should not build up in the network stack
(in the kernel or wherever it is). But the default is not to use those APIs, and to
buffer way too much.
Remember, the network send stack can act similarly to a congested switch (it is
a switch among all the user applications running on that node). IF there is a
heavy file transfer, the file transfer's buffering acts to increase latency for
all other networked communications on that machine.
Traditionally this problem has been thought of only as a within-node fairness issue, but
in fact it has a big effect on the switches in between source and destination due to the
lack of dispersed pacing of the packets at the source - in other words, the current
design does nothing to stem the "burst groups" from a single source mentioned
above.
So we do need the source nodes to implement less "bursty" sending stacks. This
is especially true for multiplexed source nodes, such as web servers implementing
thousands of flows.
A combination of codel-style switch-level buffer management and the stack at
the sender being implemented to spread packets in a particular TCP flow out
over time would improve things a lot. To achieve best throughput, the optimal
way to spread packets out on an end-to-end basis is to update the receive
window (sending ACK) at the receive end as quickly as possible, and to respond
to the updated receive window as quickly as possible when it increases.
Just like the "bufferbloat" issue, the problem is caused by applications like
streaming video, file transfers and big web pages that the application programmer sees as
not having a latency requirement within the flow, so the application programmer does not
have an incentive to control pacing. Thus the operating system has got to push back on
the applications' flow somehow, so that the flow ends up paced once it enters the
Internet itself. So there's no real problem caused by large buffering in the network
stack at the endpoint, as long as the stack's delivery to the Internet is paced by some
mechanism, e.g. tight management of receive window control on an end-to-end basis.
I don't think this can be fixed by cerowrt, so this is out of place here. It's
partially ameliorated by cerowrt, if it aggressively drops packets from flows
that burst without pacing. fq_codel does this, if the buffer size it aims for
is small - but the problem is that the OS stacks don't respond by pacing...
they tend to respond by bursting, not because TCP doesn't provide the
mechanisms for pacing, but because the OS stack doesn't transmit as soon as it
is allowed to - thus building up a burst unnecessarily.
Bursts on a flow are thus bad in general. They make congestion happen when it
need not.
By far the biggest headache is what the Web does to the network. It has
turned the web into a burst generator.
A typical web page may have 10 (or even more images). See the "connections per
page" plot in the link below.
A browser downloads the base page, and then, over N connections, essentially
simultaneously downloads those embedded objects. Many/most of them are small
in size (4-10 packets). You never even get near slow start.
So you get an IW amount of data/TCP connection, with no pacing, and no
congestion avoidance. It is easy to observe 50-100 packets (or more) back to
back at the bottleneck.
This is (in practice) the amount you have to buffer today: that burst of
packets from a web page. Without flow queuing, you are screwed. With it, it's
annoying, but can be tolerated.
I go over this is detail in:
[http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough/](http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough/)
So far, I don't believe anyone has tried pacing the IW burst of packets. I'd
certainly like to see that, but pacing needs to be across TCP connections (host
pairs) to be possibly effective to outwit the gaming the web has done to the
network.
- Jim
On Sunday, May 25, 2014 11:42am, "Mikael Abrahamsson"
<[[email protected]](mailto:[email protected])> said:
On Sun, 25 May 2014, Dane Medic wrote:
> Is it true that devices with less than 64 MB can't handle QOS? ->
>
[https://lists.chambana.net/pipermail/commotion-dev/2014-May/001816.html](https://lists.chambana.net/pipermail/commotion-dev/2014-May/001816.html)
>
At gig speeds you need around 50ms worth of buffering. 1 gigabit/s =
125 megabyte/s meaning for 50ms you need 6.25 megabyte of buffer.
I also don't see why performance and memory size would be relevant, I'd
> say forwarding performance has more to do with CPU speed than anything
else.
--
Mikael Abrahamsson email: [[email protected]](mailto:[email protected])
> _______________________________________________
Cerowrt-devel mailing list
[[email protected]](mailto:[email protected])
[https://lists.bufferbloat.net/listinfo/cerowrt-devel](https://lists.bufferbloat.net/listinfo/cerowrt-devel)
>
_______________________________________________
Cerowrt-devel mailing list
[[email protected]](mailto:[email protected])
[https://lists.bufferbloat.net/listinfo/cerowrt-devel](https://lists.bufferbloat.net/listinfo/cerowrt-devel)