Re: [NNagain] The history of congestion control on the internet

Spencer Sevilla via Nnagain Mon, 16 Oct 2023 10:21:51 -0700

That Flakeway tool makes me think of an early version of the Chaos Monkey. To 
that note, Apple maintains a developer tool called Network Link Conditioner 
that does a good job simulating reduced network performance.


> On Oct 15, 2023, at 23:30, Jack Haverty via Nnagain 
> <nnagain@lists.bufferbloat.net> wrote:
> 
> Even back in 1978, I didn't think Source Quench would work.   I recall that I 
> was trying to adapt my TCP2.5 Unix implementation to become TCP4, and I asked 
> what my TCP should do if it sent the first IP datagram to open a TCP 
> connection and received a Source Quench.  It wasn't clear at all how I should 
> "slow down".   Other TCP implementors took the receipt of an SQ as an 
> indication that a datagram they had sent had been discarded, so the obvious 
> reaction for user satisfaction was to retransmit immediately.   Slowing down 
> would simply degrade their user's experience.
> 
> Glad to hear SQ is gone.   I hope whatever replaced it works.
> 
> There's some confusion about the Arpanet.  The Arpanet was known as a "packet 
> switching network", but it had lots of internal mechanisms that essentially 
> created virtual circuits between attached computers.   Every packet sent in 
> to the network by a user computer came out at the destination intact, in 
> order, and not duplicated or lost.   The Arpanet switches even had a hardware 
> mechanism for flow control; a switch could halt data transfer from a user 
> computer when necessary.   During the 80s, the Arpanet evolved to have an 
> X.25 interface, and operated as a true "virtual circuit" provider.   Even in 
> the Defense Data Network (DDN), the network delivered a virtual circuit 
> service.  The attached users' computers had TCP, but the TCP didn't need to 
> deal with most of the network behavior that TCP was designed to handle.  
> Congestion was similarly handled by internal Arpanet mechanisms (there were 
> several technical reports from BBN to ARPA with details).    I don't remember 
> any time that "an explicit ack for every packet was ripped out of the 
> arpanet" None of those events happened when two TCP computers were connected 
> to the Arpanet.
> 
> The Internet grew up around the Arpanet, which provided most of the wide-area 
> connectivity through the mid-80s.   Since the Arpanet provided the same 
> "reliable byte stream" behavior as TCP provided, and most user computers were 
> physically attached to an Arpanet switch, it wasn't obvious how to test a TCP 
> implementation, to see how well it dealt with reordering, duplication, 
> dropping, or corruption of IP datagrams.   
> 
> We (at BBN) actually had to implement a software package called a "Flakeway", 
> which ran on a SparcStation.   Using a "feature" of Ethernets and ARP (some 
> would call it a vulnerability), the Flakeway could insert itself invisibly in 
> the stream of datagrams between any two computers on that LAN (e.g., between 
> a user computer and the gateway/router providing a path to other sites).  The 
> Flakeway could then simulate "real" Internet behavior by dropping, 
> duplicating, reordering, mangling, delaying, or otherwise interfering with 
> the flow.   That was extremely useful in testing and diagnosing TCP 
> implementations.
> 
> I understand that there has been a lot of technical work over the years, and 
> lots of new mechanisms defined for use in the Internet to solve various 
> problems.  But one issue that has not been addressed -- how do you know 
> whether or not some such mechanism has actually been implemented, and 
> configured correctly, in the millions of devices that are now using TCP (and 
> UDP, IP, etc.)?  AFAIK, there's no way to tell unless you can examine the 
> actual code.
> 
> The Internet, and TCP, was an experiment.  One aspect of that experiment 
> involved changing the traditional role of a network "switch", and moving 
> mechanisms for flow control, error control, and other mechanisms used to 
> create a "virtual circuit" behavior.   Instead of being implemented inside 
> some switching equipment, TCP's mechanisms are implemented inside users' 
> computers.    That was a significant break from traditional network 
> architecture.
> 
> I didn't realize it at the time, but now, with users' devices being 
> uncountable handheld or desktop computers rather than huge racks in 
> relatively few data centers, moving all those mechanisms from switches to 
> users' computers significantly complicates the system design and especially 
> operation.
> 
> That may be one of the more important results of the long-running experiment.
> 
> Jack Haverty
> 
> On 10/15/23 18:39, Dave Taht wrote:
>> It is wonderful to have your original perspectives here, Jack.
>> 
>> But please, everyone, before a major subject change, change the subject?
>> 
>> Jack's email conflates a few things that probably deserve other
>> threads for them. One is VGV - great acronym! Another is about the
>> "Placeholders" of TTL, and TOS. The last is the history of congestion
>> control - and it's future! As being a part of the most recent episodes
>> here I have written extensively on the subject, but what I most like
>> to point people to is my fun talks trying to make it more accessible
>> like this one at apnic
>> https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-over-yet/
>> or my more recent one at tti/vanguard.
>> 
>> Most recently one of our LibreQos clients has been collecting 10ms
>> samples and movies of what real-world residential traffic actually
>> looks like:
>> 
>> https://www.youtube.com/@trendaltoews7143
>> 
>> And it is my hope that that conveys intuition to others... as compared
>> to speedtest traffic, which prove nothing about the actual behaviors
>> of VGV traffic, which I ranted about here:
>> https://blog.cerowrt.org/post/speedtests/ - I am glad that these
>> speedtests now have latency under load reports almost universally, but
>> see the rant for more detail.
>> 
>> Most people only have a picture of traffic in the large, over 5 minute
>> intervals, which behaves quite differently, or a pre-conception that
>> backpressure actually exists across the internet. It doesn't. An
>> explicit ack for every packet was ripped out of the arpanet as costing
>> too much time. Wifi, to some extent, recreates the arpanet problem by
>> having explicit acks on the local loop that are repeated until by god
>> the packet comes through, usually without exponential backoff.
>> 
>> We have some really amazing encoding schemes now - I do not understand
>> how starlink works without retries for example, an my grip on 5G's
>> encodings is non-existent, except knowing it is the most bufferbloated
>> of all our technologies.
>> 
>> ...
>> 
>> Anyway, my hope for this list is that we come up with useful technical
>> feedback to the powers-that-be that want to regulate the internet
>> under some title ii provisions, and I certainly hope we can make
>> strides towards fixing bufferbloat along the way! There are many other
>> issues. Let's talk about those instead!
>> 
>> But...
>> ...
>> 
>> In "brief" response to the notes below - source quench died due to
>> easy ddos, AQMs from RED (1992) until codel (2012) struggled with
>> measuring the wrong things ( Kathie's updated paper on red in a
>> different light: https://pollere.net/Codel.html ), SFQ was adopted by
>> many devices, WRR used in others, ARED I think is common in juniper
>> boxes, fq_codel is pretty much the default now for most of linux, and
>> I helped write CAKE.
>> 
>> TCPs evolved from reno to vegas to cubic to bbr and the paper on BBR
>> is excellent: https://research.google/pubs/pub45646/ as is len
>> kleinrock's monograph on it. However problems with self congestion and
>> excessive packet loss were observed, and after entering the ietf
>> process, is now in it's 3rd revision, which looks pretty good.
>> 
>> Hardware pause frames in ethernet are often available, there are all
>> kinds of specialized new hardware flow control standards in 802.1, a
>> new more centralized controller in wifi7
>> 
>> To this day I have no idea how infiniband works. Or how ATM was
>> supposed to work. I have a good grip on wifi up to version 6, and the
>> work we did on wifi is in use now on a lot of wifi gear like openwrt,
>> eero and evenroute, and I am proudest of all my teams' work on
>> achieving airtime fairness, and better scheduling described in this
>> paper here: https://www.cs.kau.se/tohojo/airtime-fairness/ for wifi
>> and MOS to die for.
>> 
>> There is new work on this thing called L4S, which has a bunch of RFCs
>> for it, leverages multi-bit DCTCP style ECN and is under test by apple
>> and comcast, it is discussed on tsvwg list a lot. I encourage users to
>> jump in on the comcast/apple beta, and operators to at least read
>> this: https://datatracker.ietf.org/doc/draft-ietf-tsvwg-l4sops/
>> 
>> Knowing that there is a book or three left to write on this subject
>> that nobody will read is an issue, as is  coming up with an
>> architecture to take packet handling as we know it, to the moon and
>> the rest of the solar system, seems kind of difficult.
>> 
>> Ideally I would love to be working on that earth-moon architecture
>> rather than trying to finish getting stuff we designed in 2012-2016
>> deployed.
>> 
>> I am going to pull out a few specific questions from the below and
>> answer separately.
>> 
>> On Sun, Oct 15, 2023 at 1:00 PM Jack Haverty via Nnagain
>> <nnagain@lists.bufferbloat.net> <mailto:nnagain@lists.bufferbloat.net> wrote:
>>> The "VGV User" (Voice, Gaming, Videoconferencing) cares a lot about
>>> latency.   It's not just "rewarding" to have lower latencies; high
>>> latencies may make VGV unusable.   Average (or "typical") latency as the
>>> FCC label proposes isn't a good metric to judge usability.  A path which
>>> has high variance in latency can be unusable even if the average is
>>> quite low.   Having your voice or video or gameplay "break up" every
>>> minute or so when latency spikes to 500 msec makes the "user experience"
>>> intolerable.
>>> 
>>> A few years ago, I ran some simple "ping" tests to help a friend who was
>>> trying to use a gaming app.  My data was only for one specific path so
>>> it's anecdotal.  What I saw was surprising - zero data loss, every
>>> datagram was delivered, but occasionally a datagram would take up to 30
>>> seconds to arrive.  I didn't have the ability to poke around inside, but
>>> I suspected it was an experience of "bufferbloat", enabled by the
>>> dramatic drop in price of memory over the decades.
>>> 
>>> It's been a long time since I was involved in operating any part of the
>>> Internet, so I don't know much about the inner workings today. Apologies
>>> for my ignorance....
>>> 
>>> There was a scenario in the early days of the Internet for which we
>>> struggled to find a technical solution.  Imagine some node in the bowels
>>> of the network, with 3 connected "circuits" to some other nodes.  On two
>>> of those inputs, traffic is arriving to be forwarded out the third
>>> circuit.  The incoming flows are significantly more than the outgoing
>>> path can accept.
>>> 
>>> What happens?   How is "backpressure" generated so that the incoming
>>> flows are reduced to the point that the outgoing circuit can handle the
>>> traffic?
>>> 
>>> About 45 years ago, while we were defining TCPV4, we struggled with this
>>> issue, but didn't find any consensus solutions.  So "placeholder"
>>> mechanisms were defined in TCPV4, to be replaced as research continued
>>> and found a good solution.
>>> 
>>> In that "placeholder" scheme, the "Source Quench" (SQ) IP message was
>>> defined; it was to be sent by a switching node back toward the sender of
>>> any datagram that had to be discarded because there wasn't any place to
>>> put it.
>>> 
>>> In addition, the TOS (Type Of Service) and TTL (Time To Live) fields
>>> were defined in IP.
>>> 
>>> TOS would allow the sender to distinguish datagrams based on their
>>> needs.  For example, we thought "Interactive" service might be needed
>>> for VGV traffic, where timeliness of delivery was most important.
>>> "Bulk" service might be useful for activities like file transfers,
>>> backups, et al.   "Normal" service might now mean activities like using
>>> the Web.
>>> 
>>> The TTL field was an attempt to inform each switching node about the
>>> "expiration date" for a datagram.   If a node somehow knew that a
>>> particular datagram was unlikely to reach its destination in time to be
>>> useful (such as a video datagram for a frame that has already been
>>> displayed), the node could, and should, discard that datagram to free up
>>> resources for useful traffic.  Sadly we had no mechanisms for measuring
>>> delay, either in transit or in queuing, so TTL was defined in terms of
>>> "hops", which is not an accurate proxy for time.   But it's all we had.
>>> 
>>> Part of the complexity was that the "flow control" mechanism of the
>>> Internet had put much of the mechanism in the users' computers' TCP
>>> implementations, rather than the switches which handle only IP. Without
>>> mechanisms in the users' computers, all a switch could do is order more
>>> circuits, and add more memory to the switches for queuing.  Perhaps that
>>> led to "bufferbloat".
>>> 
>>> So TOS, SQ, and TTL were all placeholders, for some mechanism in a
>>> future release that would introduce a "real" form of Backpressure and
>>> the ability to handle different types of traffic.   Meanwhile, these
>>> rudimentary mechanisms would provide some flow control. Hopefully the
>>> users' computers sending the flows would respond to the SQ backpressure,
>>> and switches would prioritize traffic using the TTL and TOS information.
>>> 
>>> But, being way out of touch, I don't know what actually happens today.
>>> Perhaps the current operators and current government watchers can answer?:
>> I would love moe feedback about RED''s deployment at scale in particular.
>> 
>>> 1/ How do current switches exert Backpressure to  reduce competing
>>> traffic flows?  Do they still send SQs?
>> Some send various forms of hardware flow control, an ethernet pause
>> frame derivative
>> 
>>> 2/ How do the current and proposed government regulations treat the
>>> different needs of different types of traffic, e.g., "Bulk" versus
>>> "Interactive" versus "Normal"?  Are Internet carriers permitted to treat
>>> traffic types differently?  Are they permitted to charge different
>>> amounts for different types of service?
>> 
>>> Jack Haverty
>>> 
>>> On 10/15/23 09:45, Dave Taht via Nnagain wrote:
>>>> For starters I would like to apologize for cc-ing both nanog and my
>>>> new nn list. (I will add sender filters)
>>>> 
>>>> A bit more below.
>>>> 
>>>> On Sun, Oct 15, 2023 at 9:32 AM Tom Beecher <beec...@beecher.cc> 
>>>> <mailto:beec...@beecher.cc> wrote:
>>>>>> So for now, we'll keep paying for transit to get to the others (since 
>>>>>> it’s about as much as transporting IXP from Dallas), and hoping someone 
>>>>>> at Google finally sees Houston as more than a third rate city hanging 
>>>>>> off of Dallas. Or… someone finally brings a worthwhile IX to Houston 
>>>>>> that gets us more than peering to Kansas City. Yeah, I think the former 
>>>>>> is more likely. 😊
>>>>> There is often a chicken/egg scenario here with the economics. As an 
>>>>> eyeball network, your costs to build out and connect to Dallas are 
>>>>> greater than your transit cost, so you do that. Totally fair.
>>>>> 
>>>>> However think about it from the content side. Say I want to build into to 
>>>>> Houston. I have to put routers in, and a bunch of cache servers, so I 
>>>>> have capital outlay , plus opex for space, power, IX/backhaul/transit 
>>>>> costs. That's not cheap, so there's a lot of calculations that go into 
>>>>> it. Is there enough total eyeball traffic there to make it worth it? Is 
>>>>> saving 8-10ms enough of a performance boost to justify the spend? What 
>>>>> are the long term trends in that market? These answers are of course 
>>>>> different for a company running their own CDN vs the commercial CDNs.
>>>>> 
>>>>> I don't work for Google and obviously don't speak for them, but I would 
>>>>> suspect that they're happy to eat a 8-10ms performance hit to serve from 
>>>>> Dallas , versus the amount of capital outlay to build out there right now.
>>>> The three forms of traffic I care most about are voip, gaming, and
>>>> videoconferencing, which are rewarding to have at lower latencies.
>>>> When I was a kid, we had switched phone networks, and while the sound
>>>> quality was poorer than today, the voice latency cross-town was just
>>>> like "being there". Nowadays we see 500+ms latencies for this kind of
>>>> traffic.
>>>> 
>>>> As to how to make calls across town work that well again, cost-wise, I
>>>> do not know, but the volume of traffic that would be better served by
>>>> these interconnects quite low, respective to the overall gains in
>>>> lower latency experiences for them.
>>>> 
>>>> 
>>>> 
>>>>> On Sat, Oct 14, 2023 at 11:47 PM Tim Burke <t...@mid.net> 
>>>>> <mailto:t...@mid.net> wrote:
>>>>>> I would say that a 1Gbit IP transit in a carrier neutral DC can be had 
>>>>>> for a good bit less than $900 on the wholesale market.
>>>>>> 
>>>>>> Sadly, IXP’s are seemingly turning into a pay to play game, with rates 
>>>>>> almost costing as much as transit in many cases after you factor in loop 
>>>>>> costs.
>>>>>> 
>>>>>> For example, in the Houston market (one of the largest and fastest 
>>>>>> growing regions in the US!), we do not have a major IX, so to get up to 
>>>>>> Dallas it’s several thousand for a 100g wave, plus several thousand for 
>>>>>> a 100g port on one of those major IXes. Or, a better option, we can get 
>>>>>> a 100g flat internet transit for just a little bit more.
>>>>>> 
>>>>>> Fortunately, for us as an eyeball network, there are a good number of 
>>>>>> major content networks that are allowing for private peering in markets 
>>>>>> like Houston for just the cost of a cross connect and a QSFP if you’re 
>>>>>> in the right DC, with Google and some others being the outliers.
>>>>>> 
>>>>>> So for now, we'll keep paying for transit to get to the others (since 
>>>>>> it’s about as much as transporting IXP from Dallas), and hoping someone 
>>>>>> at Google finally sees Houston as more than a third rate city hanging 
>>>>>> off of Dallas. Or… someone finally brings a worthwhile IX to Houston 
>>>>>> that gets us more than peering to Kansas City. Yeah, I think the former 
>>>>>> is more likely. 😊
>>>>>> 
>>>>>> See y’all in San Diego this week,
>>>>>> Tim
>>>>>> 
>>>>>> On Oct 14, 2023, at 18:04, Dave Taht <dave.t...@gmail.com> 
>>>>>> <mailto:dave.t...@gmail.com> wrote:
>>>>>>> This set of trendlines was very interesting. Unfortunately the data
>>>>>>> stops in 2015. Does anyone have more recent data?
>>>>>>> 
>>>>>>> https://drpeering.net/white-papers/Internet-Transit-Pricing-Historical-And-Projected.php
>>>>>>> 
>>>>>>> I believe a gbit circuit that an ISP can resell still runs at about
>>>>>>> $900 - $1.4k (?) in the usa? How about elsewhere?
>>>>>>> 
>>>>>>> ...
>>>>>>> 
>>>>>>> I am under the impression that many IXPs remain very successful,
>>>>>>> states without them suffer, and I also find the concept of doing micro
>>>>>>> IXPs at the city level, appealing, and now achievable with cheap gear.
>>>>>>> Finer grained cross connects between telco and ISP and IXP would lower
>>>>>>> latencies across town quite hugely...
>>>>>>> 
>>>>>>> PS I hear ARIN is planning on dropping the price for, and bundling 3
>>>>>>> BGP AS numbers at a time, as of the end of this year, also.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Oct 30: 
>>>>>>> https://netdevconf.info/0x17/news/the-maestro-and-the-music-bof.html
>>>>>>> Dave Täht CSO, LibreQos
>>>> 
>>> _______________________________________________
>>> Nnagain mailing list
>>> Nnagain@lists.bufferbloat.net <mailto:Nnagain@lists.bufferbloat.net>
>>> https://lists.bufferbloat.net/listinfo/nnagain
>> 
>> 
> 
> _______________________________________________
> Nnagain mailing list
> Nnagain@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/nnagain

_______________________________________________
Nnagain mailing list
Nnagain@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/nnagain

Re: [NNagain] The history of congestion control on the internet

Reply via email to