Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint)

Carsten Bormann Sun, 17 Mar 2019 10:10:59 -0700

>> 
>>>> The end-to-end argument applies:  Ultimately, there needs to be 
>>>> resequencing at the end anyway, so any reordering in the network would be 
>>>> a performance optimization.  It turns out that keeping packets lying 
>>>> around in some buffer somewhere in the network just to do resequencing 
>>>> before they exit an L2 domain (or a tunnel) is a pessimization, not an 
>>>> optimization.
>>> 
>>>     I do not buy the end to end argument here, because in the extreme why 
>>> do ARQ on individual links anyway, we can just leave it to the end-points 
>>> to do the ARQ and TCP does anyway.
>> 
>> The optimization is that the retransmission on a single link (or within a 
>> path segment, which is what I’m interested in) does not need to span the 
>> entire end-to-end path.  That is strictly better than an end-to-end 
>> retransmission.  
> 
>       I agree, and by the same logic local resequencing is also better,


Non sequitur.  The same logic simply does not apply.  A resequenced packet 
consumes the same transmission resources.  (It also consumes more buffer 
resources.  So it is strictly worse when just looking at network resources 
expended, which is the basis for the kind of logic applied here.)

> unless the re-ordering event happened at the bottleneck link.

Not sure how this comes in now.

>> Also, a local segment may allow faster recovery by not implicating the 
>> entire e2e latency, which allows for strictly better latency.
>> So, yes, there are significant optimizations in doing local retransmissions, 
>> but there are also interesting interactions with end-to-end retransmission 
>> that need to be taken care of.  This has been known for a long time, e.g., 
>> see https://tools.ietf.org/html/rfc3819#section-8 which documents things 
>> that were considered to be well known in the early 2000s.
> 
>       Thanks, but my understanding of this is basically that a link should 
> just drop a packet unless it can be retransmitted with reasonable effort 
> (like the G.INP retransmissiond on dsl-links will give up); sure we can argue 
> about what "reasonable effort" is in reality, but I fear if we move away from 
> 3 dupACKs to say X ms all transport links will assume they have leewway to 
> allow re-ordering close to X, that will certainly be worse than today. And 
> since I am an end-user and do not operate a transport network, I know what I 
> prefer here…

I’m sorry, I grew up as transport layer guy, so “transport” means L4 (transport 
layer) for me, not “transport network”.
You may want to re-read my sentences with that knowledge; they might make more 
sense.

>> Resequencing (which is the term I prefer for putting things back in sequence 
>> again, after they have been reordered) requires storing packets that are 
>> ahead of later packets.
> 
>       Obviously.
> 
>> This is strictly suboptimal if these packets could be delivered instead (in 
>> contrast, it *is* a good idea to resequence packets that are in a queue 
>> waiting for a transmission opportunity).
> 
>       Fair enough, but that basically expects the bottleneck link that 
> actually accumulates a queue to do the heavy lifting, not sure that the 
> economic incentives are properly aligned here.

It can actually do so more easily, because the speeds are lower.
But deployment economy arguments are interesting as well; I was making 
theoretical arguments first.

>> So *requiring*(*) local path segments to resequence is strictly suboptimal.
>> 
>> (*) even if this is not a strict requirement, but just a statement of the 
>> form “the transport will be much more efficient if you deliver in order”.
> 
>       My point is the transport will much more useful if if undertakes 
> (reasonable) effort to deliver in-order,

Please re-read as advised above.

> that is slight;y different, and I understand that those responsible for 
> transport networks have a different viewpoint on this.
> 
>> 
>>> To put numbers to my example, assume I am on a 1/1 Mbps link and I get TCP 
>>> data at 1 Mbps rate and MTU1500 packets (I am going to keep the numbers 
>>> approximate) and I get a burst of say 10 packets containing say 10 
>>> individual messages for my application telling the position of say an 
>>> object in 3d space
>>> 
>>> each packet is going to "hog" the link for: 1000 ms/s * (1500 * 8 b/packet 
>>> ) / (1000 * 1000 b/s)  = 12 ms
>>> So I get access to messages/new positions every 12 ms and I can display 
>>> this smoothly
>> 
>> That is already broken by design.
> 
>       Does not matter much, a well designed network should also allow to do 
> stupid things…

Sure, but it won’t work very well then (and there is no point in optimizing for 
that — remember: all in-network work is just an optimization under the 
end-to-end principle).

>> If you are not accounting for latency variation (“jitter”), you won’t be 
>> able to deal with it.
> 
>       Which would just complicate the issue a bit if we would introduce a say 
> 25 ms de-jitter buffer without affecting the gist of it.

That buffer increases the total latency but also the (useful) packet delivery 
rate in the presence of reordering.

>> Your example also makes sure it does not work well by being based on 100 % 
>> utilization.
> 
>       Same here, access links certainly run closer to 100% utilization than 
> core links, so operation at full saturation is not completely unrealistic, 
> but I really just set it up that way for clarity.

Please use an example that is more realistic.

>>> Now if the first packet gets r-odered to be last, I either drop that packet
>> 
>> …which is another nice function the network could do for you before 
>> expending further resources on useless delivery; see e.g. 
>> draft-ietf-6lo-deadline-time for one way to do this.
> 
>       Yes, but typically I do not want the network to do this, as I would be 
> quite interested in knowing how much too late the packet arrived.

I don’t know how to make use of that knowledge, do you?
Early discarding of a late packet (e.g., by not retransmitting it in the first 
place) is so much better.

>>> and accept a 12 ms gap or if that is not an option I get to wait 9*12 = 
>>> 108ms before positions can be updated, that IMHO shows why re-ordering is 
>>> terrible even if TCP would be more tolerant. 
>> 
>> You are assuming that the network can magically resequence a packet into 
>> place that it does not have.
> 
>       All I expect is that the network makes a reasonable effort to undo 
> re-ordering close to where re-ordering happened.

All I’m trying to say is that this is bad engineering, apparently perpetuated 
by bad transport layer implementations.

>> Now I do understand that forwarding an out-of-order packet will block the 
>> output port for the time needed to serialize it.  So if you get it right 
>> before what would have been an in-order packet, the latter incurs additional 
>> latency.  Note that this requires a bottleneck configuration, i.e., packets 
>> to be forwarded arrive faster than they can be serialized out.  Don’t do 
>> bottlenecks if you want ultra-low latency.  (And don’t do links where you 
>> need to retransmit, either.)
> 
>       I agree, but that is live with a home internet access link, the 
> bottleneck is there. This also points out a problem with the L4S argument for 
> end-users, as the ultra-low latency (their words, not mine) will not realize 
> for end-users close to what the project seems to promise.

I think reordering is not really a problem for ultra-low latency, or more 
specifically, once reordering happens, you are no longer in the ultra-low 
latency domain,

>>> Especially in the context of L4S something like this seems to be totally 
>>> unacceptable if ultra-low latency is supposed to be anything more than 
>>> marketing. 
>> 
>> Dropping packets that can’t be used anyway is strictly better than 
>> delivering them.
> 
>       Well, not for L4S, as TCP Praque is supposed to fall back to legacy 
> congestion control behavior upon encountering packet drops…

L4S is for reliable transport, which is a different scenario than the one that 
benefits a lot from deadlines for packets.  (Well, deadlines might be used to 
make sure there is no dual retransmission, both local and end-to-end, but 
again, this is not where you would use L4S.)

>> But apart from that, forwarding packets that I have is strictly better for 
>> low latency than leaving the output port idle and waiting for 
>> previous-in-order packets to send them out in sequence.
> 
>       It really depends what we mean when we talk about latency here, as 
> shown for and end-user that might be quite different…

Apart from the port blocking effect I talked about (which is mostly relevant 
for highly scheduled transmission schemes), I really have no idea how the 
end-to-end latency would benefit from sitting on packets while the port is idle.

>>>> For three decades now, we have acted as if there is no cost for in-order 
>>>> delivery from L2 — not because that is true, but because deployed 
>>>> transport protocol implementations were built and tested with simple links 
>>>> that don’t reorder.  
>>> 
>>>     Well, that is similar to the argument for performing non-aligned loads 
>>> fast in hardware, yes this comes with a considerable cost in complexity and 
>>> it is harder to make this go fast than just allowing aligned loads and 
>>> fixing up unaligned loads by trapping to software, but from a user 
>>> perspective the fast hardware beats the fickle only make aligned loads go 
>>> fast approach any old day.
>> 
>> CPUs have an abundance of transistors you can throw at this problem so the 
>> support of unaligned loads has become standard practice for CPUs with enough 
>> transistors.
>> I’m not sure this argument transfers, because this is not about transistors 
>> (except maybe when we talk about in-queue resequencing, which would be a 
>> nice feature if we had information in the packets to allow it).
> 
> Like the 5-tuple in TCP and UDP?

That doesn’t help.  I need a sequence number for resequencing, and I can’t use 
the transport layer one because that is being encrypted.  Again, this is mostly 
theoretical as I don’t see people rushing to do in-queue resequencing any time 
soon.

(Skipping some text that is not relevant to my argument here.)

>> Where does this number come from?  100 ms is pretty long as a reordering 
>> maximum for most paths outside of satellite links. Instead, you would do 
>> something based on an RTT estimate.
> 
>       I just made that number up as the exact N does not matter, the argument 
> is what ever we set as the new threshold will be approached by transport 
> characteristics. Then again havin something that inversely scales with 
> bandwidth is certainly terrible from a transport perspective, so I can 
> understand the argument for a fixed temporal threshold.

I don’t follow at all here.

>>>> at least within some limits that we still have to find.
>>>> That probably requires some evolution at the end-to-end transport 
>>>> implementation layer.  We are in a better position to make that happen 
>>>> than we have been for a long time.
>>> 
>>>     Probably true, but also not very attractive from an end-user 
>>> perspective…. unless this will allow transport innovations that will allow 
>>> massively more bandwidth at a smallish latency cost.
>> 
>> The argument against in-network resequencing is mostly a latency argument 
>> (but, as a second order effect, that reduced latency may also allow more 
>> throughput), so, again, I don’t quite understand.
> 
>       As I tried to show for TCP the flow with re-ordered packets certainly 
> pays a latency cost that especially if re-ordering does not happen on the 
> bottleneck link but at a faster link could be smaller.

I can’t parse this sentence, but my main point remains:

In-network resequencing increases latency (with a potential impact on 
throughput, too), unless it happens within a queue.  We wouldn’t want to do 
that, unless forced by a transport protocol that can’t cope.  If we can fix the 
transport protocols to enable (out-of-order) immediate forwarding, then let’s 
do it; this might also enable doing more in-network recovery, with the 
attendant performance improvements.

Grüße, Carsten

_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Packet reordering and RACK (was The "Some Congestion Experienced" ECN codepoint)

Reply via email to