On Fri, 3 Apr 2015, Jonathan Morton wrote:

I'd like them to put some sane upper bound on the RTT - one compatible
with satellite links, but which would avoid flooding unmanaged buffers to
multi-minute delays.

The problem is that there aren't any numbers that meet these two criteria.
Even if you ignore 10G and faster interfaces, a 1Gb/s interface
withsatellite sized latencies is a LOT of data, far more than is needed to
flood a 'normal' link

I very deliberately said "RTT", not "BDP".  TCP stacks already track an
estimate of RTT for various reasons, so in principle they could stop
increasing the congestion window when that RTT reaches some critical value
(1 second, say). The fact that they do not already do so is evidenced by
the observations of multi-minute induced delays in certain circumstances.

I think the huge delays aren't because the RTT estimates are that long, but rather that early on the availble bandwidth estimates were wildly high because there was no feedback happening to indicate otherwise (the buffers were hiding it all)

once you get into the collapse mode of operation where you are sending multiple packets for every one that gets through, it's _really_ hard to recover short of just stopping for a while to let the junk clear.

If it was gradual degredation all the way down, then backing off a little bit would show clear improvement and feedback loops would clear thigns up fairly quickly. But when there is a cliff in the performance curve, and you go way beyond the cliff before you notice it (think Wile E. Coyote missing a turn in the road), you can't just step back to recover. When a whole group of people do the same thing, the total backoff that needs to happen for the network to recover is frequenly significantly more than any one system's contribution to the problem. They all need to back off a lot.

And this is not a complete solution by any means. Vegas proved that an
altruistic limit on RTT by an endpoint, with no other measures within the
network, leads to poor fairness between flows. But if the major OSes did
that, more networks would be able to survive overload conditions while
providing some usable service to their users.

But we don't need to take such a risk, we have active queue management algorithms that we know will work if they are deployed on the chokepoint machines (for everything except wifi hops right now)

best of all, these don't require any knowlege or guesswork about the overall network and no knowlege of the RTT or bandwidth-latency product. All they need is information about the data flows going through the device and when the local link can accept mroe data.

making decisions based on local data scales really well. making estimates of the state of the network overall, not so much.

David Lang
_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to