Note: this is all about "how to achieve and sustain the ballistic phase that is 
optimal for Internet transport" in an end-to-end based control system like TCP.
 
I think those who have followed this know that, but I want to make it clear 
that I'm proposing a significant improvement that requires changes at the OS 
stacks and changes in the switches' approach to congestion signaling.  There 
are ways to phase it in gradually.  In "meshes", etc. it could probably be 
developed and deployed more quickly - but my thoughts on co-existence with the 
current TCP stacks and current IP routers are far less precisely worked out.
 
I am way too busy with my day job to do what needs to be done ... but my sense 
is that the folks who reduce this to practice will make a HUGE difference to 
Internet performance.  Bigger than getting bloat fixed, and to me that is a 
major, major potential triumph.
 


On Thursday, May 29, 2014 8:11am, "David P. Reed" <[email protected]> said:


ECN-style signaling has the right properties ... just like TTL it can provide 
valid and current sampling of the packet ' s environment as it travels. The 
idea is to sample what is happening at a bottleneck for the packet ' s flow.  
The bottleneck is the link with the most likelihood of a collision from flows 
sharing that link.

 A control - theoretic estimator of recent collision likelihood is easy to do 
at each queue.  All active flows would receive that signal, with the busiest 
ones getting it most quickly. Also it is reasonable to count all potentially 
colliding flows at all outbound queues, and report that.

 The estimator can then provide the signal that each flow responds to.

 The problem of "defectors" is best dealt with by punishment... An aggressive 
packet drop policy that makes causing congestion reduce the cause's throughput 
and increases latency is the best kind of answer. Since the router can remember 
recent flow behavior, it can penalize recent flows.

 A Bloom style filter can remember flow statistics for both of these local 
policies. A great use for the memory no longer misapplied to buffering....

 Simple?


On May 28, 2014, David Lang <[email protected]> wrote:
On Wed, 28 May 2014, [email protected] wrote:

I did not mean that "pacing".  Sorry I used a generic term.  I meant what my 
longer description described - a specific mechanism for reducing bunching that 
is essentially "cooperative" among all active flows through a bottlenecked 
link.  That's part of a "closed loop" control system driving each TCP endpoint 
into a cooperative mode.
how do you think we can get feedback from the bottleneck node to all the 
different senders?

what happens to the ones who try to play nice if one doesn't?, including what 
happens if one isn't just ignorant of the new cooperative mode, but activly 
tries to cheat? (as I understand it, this is the fatal flaw in many of the past 
buffering improvement proposals)

While the in-h ouserouter is the first bottleneck that user's traffic hits, the 
bigger problems happen when the bottleneck is in the peering between ISPs, many 
hops away from any sender, with many different senders competing for the 
avialable bandwidth.

This is where the new buffering approaches win. If the traffic is below the 
congestion level, they add very close to zero overhead, but when congestion 
happens, they manage the resulting buffers in a way that's works better for 
people (allowing short, fast connections to be fast with only a small impact on 
very long connections)

David Lang

The thing you call "pacing" is something quite different.  It is disconnected 
from the TCP control loops involved, which basically means it is flying blind. 
Introducing that kind of "pacing" almost certainly  reducesthroughput, because 
it *delays* packets.

The thing I called "pacing" is in no version of Linux that I know of.  Give it 
a different name: "anti-bunching cooperation" or "timing phase management for 
congestion reduction". Rather than *delaying* packets, it tries to get packets 
to avoid bunching only when reducing window size, and doing so by tightening 
the control loop so that the sender transmits as *soon* as it can, not by 
delaying sending after the sender dallies around not sending when it can.







On Tuesday, May 27, 2014 11:23am, "Jim Gettys" <[email protected]> said:







On Sun, May 25, 2014 at 4:00 PM,  <[[email protected]](mailto:[email protected])> 
wrote:

Not that it is directly relevant, but there is no essential reason to require 
50 ms. of buffering.  That might be true of some particular QOS-related router 
algorith m.  50ms. is about all one can tolerate in any router between source 
and destination for today's networks - an upper-bound rather than a minimum.

The optimum buffer state for throughput is 1-2 packets worth - in other words, 
if we have an MTU of 1500, 1500 - 3000 bytes. Only the bottleneck buffer (the 
input queue to the lowest speed link along the path) should have this much 
actually buffered. Buffering more than this increases end-to-end latency beyond 
its optimal state.  Increased end-to-end latency reduces the effectiveness of 
control loops, creating more congestion.

The rationale for having 50 ms. of buffering is probably to avoid disruption of 
bursty mixed flows where the bursts might persist for 50 ms. and then die. One 
reason for this is that source nodes run operating systems that tend to release 
packets in bursts. That's a whole other discussion - in an ideal world, source 
nodes would avoid bursty packet releases by letting the control by the receiver 
 windowbe "tight" timing-wise.  That is, to transmit a packet immediately at 
the instant an ACK arrives increasing the window.  This would pace the flow - 
current OS's tend (due to scheduling mismatches) to send bursts of packets, 
"catching up" on sending that could have been spaced out and done earlier if 
the feedback from the receiver's window advancing were heeded.

​

That is, endpoint network stacks (TCP implementations) can worsen congestion by 
"dallying".  The ideal end-to-end flows occupying a congested router would have 
their packets paced so that the packets end up being sent in the least bursty 
manner that an application can support.  The effect of this pacing is to move 
the "backlog" for each flow quickly into the source node for that flow, which 
then provides back pressure on the application driving the flow, which 
ultimately is necessary to stanch congestion.  The ideal congestion control 
mechanism slows the sender part of the application to a pac e thatcan go 
through the network without contributing to buffering.
​​
​Pacing is in Linux 3.12(?).  How long it will take to see widespread 
deployment is another question, and as for other operating systems, who knows.
See: [[ https://lwn.net/Articles/564978 ]( https://lwn.net/Articles/564978 
)/]([ https://lwn.net/Articles/564978 ]( https://lwn.net/Articles/564978 )/)
​​

Current network stacks (including Linux's) don't achieve that goal - their 
pushback on application sources is minimal - instead they accumulate buffering 
internal to the network implementation.
​This is much, much less true than it once was.  There have been substantial 
changes in the Linux TCP stack in the last year or two, to avoid generating 
packets before necessary.  Again, how long it will take for people to deploy 
this on Linux (and implement on other OS's) is a question.
​
This contributes to end-to-end latency as well.  But if you th inkabout it, 
this is almost as bad as switch-level bufferbloat in terms of degrading user 
experience.  The reason I say "almost" is that there are tools, rarely used in 
practice, that allow an application to specify that buffering should not build 
up in the network stack (in the kernel or wherever it is).  But the default is 
not to use those APIs, and to buffer way too much.

Remember, the network send stack can act similarly to a congested switch (it is 
a switch among all the user applications running on that node).  IF there is a 
heavy file transfer, the file transfer's buffering acts to increase latency for 
all other networked communications on that machine.

Traditionally this problem has been thought of only as a within-node fairness 
issue, but in fact it has a big effect on the switches in between source and 
destination due to the lack of dispersed pacing of the packets at the source - 
in other words, the current design does nothing to stem the "burst g roups"from 
a single source mentioned above.

So we do need the source nodes to implement less "bursty" sending stacks.  This 
is especially true for multiplexed source nodes, such as web servers 
implementing thousands of flows.

A combination of codel-style switch-level buffer management and the stack at 
the sender being implemented to spread packets in a particular TCP flow out 
over time would improve things a lot.  To achieve best throughput, the optimal 
way to spread packets out on an end-to-end basis is to update the receive 
window (sending ACK) at the receive end as quickly as possible, and to respond 
to the updated receive window as quickly as possible when it increases.

Just like the "bufferbloat" issue, the problem is caused by applications like 
streaming video, file transfers and big web pages that the application 
programmer sees as not having a latency requirement within the flow, so the 
application programmer does not have an incentive to co ntrolpacing.  Thus the 
operating system has got to push back on the applications' flow somehow, so 
that the flow ends up paced once it enters the Internet itself.  So there's no 
real problem caused by large buffering in the network stack at the endpoint, as 
long as the stack's delivery to the Internet is paced by some mechanism, e.g. 
tight management of receive window control on an end-to-end basis.

I don't think this can be fixed by cerowrt, so this is out of place here.  It's 
partially ameliorated by cerowrt, if it aggressively drops packets from flows 
that burst without pacing. fq_codel does this, if the buffer size it aims for 
is small - but the problem is that the OS stacks don't respond by pacing... 
they tend to respond by bursting, not because TCP doesn't provide the 
mechanisms for pacing, but because the OS stack doesn't transmit as soon as it 
is allowed to - thus building up a burst unnecessarily.

Bursts on a flow are thus bad in general.  They makecongestion happen when it 
need not.
​By far the biggest headache is what the Web does to the network.  It has 
turned the web into a burst generator.
A typical web page may have 10 (or even more images).  See the "connections per 
page" plot in the link below.
A browser downloads the base page, and then, over N connections, essentially 
simultaneously downloads those embedded objects.  Many/most of them are small 
in size (4-10 packets).  You never even get near slow start.
So you get an IW amount of data/TCP connection, with no pacing, and no 
congestion avoidance.  It is easy to observe 50-100 packets (or more) back to 
back at the bottleneck.
This is (in practice) the amount you have to buffer today: that burst of 
packets from a web page.  Without flow queuing, you are screwed.  With it, it's 
annoying, but can be tolerated.
I go over this is detail in:

[[ 
http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough
 ]( 
http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough
 )/]([ 
http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough
 ]( 
http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough
 )/)​
So far, I don't believe anyone has tried pacing the IW burst of packets.  I'd 
certainly like to see that, but pacing needs to be across TCP connections (host 
pairs) to be possibly effective to outwit the gaming the web has done to the 
network.
- Jim








On Sunday, May 25, 2014 11:42am, "Mikael Abrahamsson" 
<[[email protected]](mailto:[email protected])> said:



On Sun, 25 May 2014, Dane Medic wrote:

Is it true that devices with less than 64 MB can't handle QOS? ->
[[ https://lists.chambana.net/pipermail/commotion-dev/2014-May/001816.html ]( 
https://lists.chambana.net/pipermail/commotion-dev/2014-May/001816.html )]([ 
https://lists.chambana.net/pipermail/commotion-dev/2014-May/001816.html ]( 
https://lists.chambana.net/pipermail/commotion-dev/2014-May/001816.html ))
At gig speeds you need around 50ms worth of buffering. 1 gigabit/s =
125 megabyte/s meaning for 50ms you need 6.25 megabyte of buffer.

I also don't see why performance and memory size would be relevant, I'd
say forwarding performance has more to do with CPU speed than anything
else.

--
Mikael Abrahamsson    email:[[email protected]](mailto:[email protected])

Cerowrt-devel mailing list
[[email protected]](mailto:[email protected])
[[ https://lists.bufferbloat.net/listinfo/cerowrt-devel ]( 
https://lists.bufferbloat.net/listinfo/cerowrt-devel )]([ 
https://lists.bufferbloat.net/listinfo/cerowrt-devel ]( 
https://lists.bufferbloat.net/listinfo/cerowrt-devel ))

Cerowrt-devel mailing list
[[email protected]](mailto:[email protected])
[[ https://lists.bufferbloat.net/listinfo/cerowrt-devel ]( 
https://lists.bufferbloat.net/listinfo/cerowrt-devel )]([ 
https://lists.bufferbloat.net/listinfo/cerowrt-devel ]( 
https://lists.bufferbloat.net/listinfo/cerowrt-devel ))
 

Cerowrt-devel mailing list
[email protected]
[ https://lists.bufferbloat.net/listinfo/cerowrt-devel ]( 
https://lists.bufferbloat.net/listinfo/cerowrt-devel )

-- Sent from my Android device with [ K-@ Mail ]( 
https://play.google.com/store/apps/details?id=com.onegravity.k10.pro2 ). Please 
excuse my brevity.
_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Reply via email to