[aqm] thoughts on operational queue settings (Re: [tsvwg] CC/bleaching thoughts for draft-ietf-tsvwg-le-phb-04) (fwd)

Mikael Abrahamsson Wed, 11 Apr 2018 23:46:29 -0700

Hi,

I sent this to tsvwg where we're discussing the LE codepoint. Since I amnow talking queue settings, I thought it might be interesting to getfeedback from this group as well on what advice we should give operators.

Please take into account that I am aiming for what is possible oncurrently deployed platforms, seen in the field. Not what might bepossible on future hardware/software. So available are generally (W)REDper queue and a few queues per customer.

I am also going to test a 3 queue setup, where each of these groups ofDSCP values would go into different queues where LE would perhaps beassured 5% of BW and the rest split evenly between a BE and "everythingelse" queue. If I did that, I would probably not start dropping LE trafficuntil 10-20ms buffer fill.


---------- Forwarded message ----------
Date: Thu, 12 Apr 2018 08:39:25 +0200 (CEST)
From: Mikael Abrahamsson <[email protected]>
To: Brian E Carpenter <[email protected]>
Cc: [email protected]
Subject: thoughts on operational queue settings (Re: [tsvwg] CC/bleaching
    thoughts for draft-ietf-tsvwg-le-phb-04)

On Thu, 12 Apr 2018, Brian E Carpenter wrote:

BE and LE PHBs should talk about queueing and dropping behaviour, not aboutcapacity share, in any case. It's clear that on a congested link, LE issacrificed first - peak hour LE throughput might be precisely zero, which isnot acceptable for BE.

I have received questions from operational people for configuration examplesfor how to handle LE/BE etc. So I did some work in our lab to give some kind ofexample.

So my first goal was to figure out something that'd do something reasonable ona platform that'll only do DSCP based RED (as this is typically available onplatforms going back 15 years). This is not optimal, but at least it would bedeployable on lots of platforms currently installed and moving packets forcustomers.

The test was performed with 30ms of RTT, 10 parallel TCP sessions per diffservRED curve, 800 megabit/s access speed (it's really gig, but in my lab setup Ihave some contraints that meant if I set it to gig I might get someuncontrolled packet loss due to other equipment sitting on the same sharedlink, so I opted for 800 megabit/s as "close enough").

What I came up with that would give LE ~10% of access bandwidth compared to BE,and a slight advantage for anything that is not BE/LE (goal was to give thistraffic a lossless experience) was this:

This is a Cisco ASR9k that without this RED configuration will buffer packetsup to ~90 milliseconds, resulting in 120ms RTT (30ms path RTT and 90msbuffer-bloat).


 class class-default
  shape average 800 mbps
  random-detect dscp 1,8 1 ms 500 ms
  random-detect dscp 0,2-7 5 ms 1000 ms
  random-detect dscp 9-63 10 ms 1000 ms

This basically says that for LE and CS1, start dropping packets at 1ms ofbuffer fill. Since some applications use CS1 for scavanger, it made sense to meto treat CS1 and LE the same.

For BE (which I made to be DSCP 0,2-7), start dropping packets at 5ms bufferfill, less agressively compared to LE.

For the rest, don't start dropping packets until 10ms buffer fill, giving itslight advantage (thought here being that gaming traffic etc should not seemuch drops even though they will see some induced RTT because of BE traffic).

This typically results in LE using approximately 30-50 megabit/s when there are10 LE TCP sessions and 10 BE TCP sessions, all trying to go full out. The BEsessions then get ~750 megabit/s. The added buffer delay is around 5-10ms asthat's where the BE sessions settle their BW usage. Platform unfortunatelydoesn't support ECN marking.

If I were to spend queues on this traffic instead of using RED, I would do thisdifferently. I will do more tests with lower speeds etc, this was just initialtesting for one use-case, but also to give an example of what can be done oncurrently shipping platforms. I know there are much better ways of doing this,but I want this into networks NOW, not in 5-10 years. So the easier the advice,the better chance we get this into production networks.

I don't think it's a good idea to give CS1/LE no bandwidth at all, that mightcause failure cases we can't predict. I prefer to give LE traffic a bigdisadvantage, so that it might only get 5-10% or something of bandwidth, whenthere is competing traffic.

I will do more testing, I have several typical platforms available to me thatare in wide use.


--
Mikael Abrahamsson    email: [email protected]

_______________________________________________
aqm mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/aqm

[aqm] thoughts on operational queue settings (Re: [tsvwg] CC/bleaching thoughts for draft-ietf-tsvwg-le-phb-04) (fwd)

Reply via email to