(Awesome development - I have a computer with a sane e-mail client again.  One 
that doesn’t assume I want to top-post if I quote anything at all, *and* lets 
me type with an actual keyboard.  Luxury!)

>> One of the features well observed in real measurements of real systems is 
>> that packet flows are "fractal", which means that there is a self-similarity 
>> of rate variability all time scales from micro to macro.
> 
> I remember this debate and its evolution, Hurst parameters and all that. I 
> also understand that a collection of on/off Poisson sources looks fractal - I 
> found that “the universe if fractal - live with it” ethos of limited 
> practical use (except to help people say it was not solvable).

>> Designers may imagine that their networks have "smooth averaging" 
>> properties. There's a strong thread in networking literature that makes this 
>> pretty-much-always-false assumption the basis of protocol designs, thinking 
>> about "Quality of Service" and other sorts of things. You can teach graduate 
>> students about a reality that does not exist, and get papers accepted in 
>> conferences where the reviewers have been trained in the same tradition of 
>> unreal assumptions.
> 
> Agreed - there is a massive disconnect between a lot of the literature (and 
> the people who make their living generating it - [to those people, please 
> don’t take offence, queueing theory is really useful it is just the real 
> world is a lot more non-stationary than you model]) and reality.

Probably a lot of theoreticians would be horrified at the extent to which I 
ignored mathematics and relied on intuition (and observations of real traffic, 
ie. eating my own dogfood) while building Cake.

That approach, however, led me to some novel algorithms and combinations 
thereof which seem to work well in practice, as well as to some practical 
observations about the present state of the Internet.  I’ve also used some 
contributions from others, but only where they made sense at an intuitive level.

However, Cake isn’t designed to work in the datacentre.  Nor is it likely to 
work optimally in an ISP’s core networks.  The combination of features in Cake 
is not optimised for those environments, rather for last-mile links which are 
typically the bottlenecks experienced by ordinary Internet users.  Some of 
Cake's algorithms could reasonably be reused in a different combination for a 
different environment.

> I see large scale (i.e. public internets) not as a mono-service but as a 
> “poly service” - there are multiple demands for timeliness etc that exist out 
> there for “real services”.

This is definitely true.  However, the biggest problem I’ve noticed is with 
distinguishing these traffic types from each other.  In some cases there are 
heuristics which are accurate enough to be useful.  In others, there are not.  
Rarely is the distinction *explicitly* marked in any way, and some common 
protocols explicitly obscure themselves due to historical mistreatment.

Diffserv is very hard to use in practice.  There’s a controversial fact for you 
to chew on.

> We’ve worked with people who have created risks for Netflix delivery 
> (accidentally I might add - they though they were doing “the right thing”) by 
> increasing their network infrastructure to 100G delivery everywhere. That 
> change (combined with others made by CDN people - TCP offload engines) 
> created so much non-stationarity in the load so as to cause delay and loss 
> spikes that *did* cause VoD playout buffers to empty.  This is an example of 
> where “more capacity” produced worse outcomes.

That’s an interesting and counter-intuitive result.  I’ll hazard a guess that 
it had something to do with burst loss in dumb tail-drop FIFOs?  Offload 
engines tend to produce extremely bursty traffic which - with a nod to another 
thread presently ongoing - makes a mockery of any ack-clocking or pacing which 
TCP designers normally assume is in effect.

One of the things that fq_codel and Cake can do really well is to take a deep 
queue full of consecutive line-rate bursts and turn them into interleaved 
packet streams, which are at least slightly better “paced” than the originals. 
They also specifically try to avoid burst loss and (at least in Cake’s case) 
tail loss.

It is of course regrettable that this behaviour conflicts with the assumptions 
of most network acceleration hardware, and that maximum throughput might 
therefore be compromised.  The *qualitative* behaviour is however improved.

> I would suggest that there are other ways of dealing with the impact of 
> “peak” (i.e where instantaneous demand exceeds supply over a long enough 
> timescale to start effecting the most delay/loss sensitive application in the 
> collective multiplexed stream).

Such as signalling to the traffic that congestion exists, and to please slow 
down a bit to make room?  ECN and AQM are great ways of doing that, especially 
in combination with flow isolation - the latter shares out the capacity fairly 
on short timescales, *and* avoids the need to signal congestion to flows which 
are already using less than their fair share.

> I would also agree that if all the streams are of the same “bound on delay 
> and loss” requirements (i.e *all* Netflix) then 100%+ of all the same load 
> (over, again the appropriate timescale - which for Netflix VoD in streaming 
> is about 20s to 30s) then end-user disappointment is the only thing that can 
> occur.

I think emphasising the importance of measurement timescales is consistently 
underrated in the industry and in academia alike.  An hour-long bucket of 
traffic tells you about a very different set of characteristics than a 
millisecond-long bucket, and there are several timescales between those 
extremes of great practical interest.

 - Jonathan Morton

_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to