Re: [Bloat] sigcomm wifi

Michael Welzl Sat, 23 Aug 2014 12:28:18 -0700

[removing Lars and Jim from direct cc, don't want to spam them - I don't know 
if they're sooo interested in this thread?]



On 23. aug. 2014, at 01:50, David Lang <[email protected]> wrote:

> On Sat, 23 Aug 2014, Michael Welzl wrote:
> 
>> On 21. aug. 2014, at 10:30, David Lang <[email protected]> wrote:
>> 
>>> On Thu, 21 Aug 2014, Michael Welzl wrote:
>>> 
>>>> On 21. aug. 2014, at 08:52, Eggert, Lars wrote:
>>>> 
>>>>> On 2014-8-21, at 0:05, Jim Gettys <[email protected]> wrote:
>>>>>> And what kinds of AP's?  All the 1G guarantees you is that your 
>>>>>> bottleneck is in the wifi hop, and they can suffer as badly as anything 
>>>>>> else (particularly consumer home routers).
>>>>>> The reason why 802.11 works ok at IETF and NANOG is that:
>>>>>> o) they use Cisco enterprise AP's, which are not badly over buffered.
>>>> 
>>>> I'd like to better understand this particular bloat problem:
>>>> 
>>>> 100s of senders try to send at the same time. They can't all do that, so 
>>>> their cards retry a fixed number of times (10 or something, I don't 
>>>> remember, probably configurable), for which they need to have a buffer.
>>>> 
>>>> Say, the buffer is too big. Say, we make it smaller. Then an 802.11 sender 
>>>> trying to get its time slot in a crowded network will have to drop a 
>>>> packet, requiring the TCP sender to retransmit the packet instead. The TCP 
>>>> sender will think it's congestion (not entirely wrong) and reduce its 
>>>> window (not entirely wrong either). How appropriate TCP's cwnd reduction 
>>>> is probably depends on how "true" the notion of congestion is ... i.e. if 
>>>> I can buffer only one packet and just don't get to send it, or it gets a 
>>>> CRC error ("collides" in the air), then that can be seen as a pure matter 
>>>> of luck. Then I provoke a sender reaction that's like the old story of TCP 
>>>> mis-interpreting random losses as a sign of congestion. I think in most 
>>>> practical systems this old story is now a myth because wireless equipment 
>>>> will try to buffer data for a relatively long time instead of exhibiting 
>>>> sporadic random drops to upper layers. That is, in principle, a good thing 
>>>> - but buffering too much has of c!
> ourse all the problems that we know.. Not an easy trade-off at all I think.
>>> 
>>> in this case the loss is a direct sign of congestion.
>> 
>> "this case" - I talk about different buffer lengths. E.g., take the minimal 
>> buffer that would just function, and set retransmissions to 0. Then, a 
>> packet loss is a pretty random matter - just because you and I contended, 
>> doesn't mean that the net is truly "overloaded" ?  So my point is that the 
>> buffer creates a continuum from "random loss" to "actual congestion" - we 
>> want loss to mean "actual congestion", but how large should it be to 
>> meaningfully convey that?
>> 
>> 
>>> remember that TCP was developed back in the days of 10base2 networks where 
>>> everyone on the network was sharing a wire and it was very possible for 
>>> multiple senders to start transmitting on the wire at the same time, just 
>>> like with radio.
>> 
>> cable or wireless: is one such occurrence "congestion"?
>> i.e. is halving the cwnd really the right response to that sort of 
>> "congestion"? (contention, really)
> 
> possibly not, but in practice it may be 'good enough'
> 
> but to make it work well, you probably want to play games with how much you 
> back off, and how quickly you retry if you don't get a response.
> 
> The fact that the radio link can have it's own ack for the packet can 
> actually be an improvement over doing it at the TCP level as it only need to 
> ack/retry for that hop, and if that hop was good, there's far less of a need 
> to retry if the server is just slow.

Yep... I remember a neat paper from colleagues at Trento University that 
piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions 
between TCP's ACKs and other data packets - really nice. Not sure if it wasn't 
just simulations, though.


> so if we try and do the retries in the OS stack, it will need to know the 
> difference between "failed to get out the first hop due to collision" and 
> "got out the first hop, waiting for the server across the globe to respond" 
> with different timeouts/retries for them.
> 
>>> A large part of the problem with high-density wifi is that it just wasn't 
>>> designed for that sort of environment, and there are a lot of things that 
>>> it does that work great for low-density, weak signal environments, but just 
>>> make the problem worse for high-density environements
>>> 
>>> batching packets together
>>> slowing down the transmit speed if you aren't getting through
>> 
>> well... this *should* only happen when there's an actual physical signal 
>> quality degradation, not just collisions. at least minstrel does quite a 
>> good job at ensuring that, most of the time.
> 
> "should" :-)
> 
> but can the firmware really tell the difference between quality degredation 
> due to interference and collisions with other transmitters?

Well, with heuristics it can, sort of. As a simple example from one older 
mechanism, consider: multiple consecutive losses are *less* likely from random 
collisions than from link noise. That sort of thing. Minstrel worked best our 
tests, using tables of rates that worked well / didn't work well in the past:
http://heim.ifi.uio.no/michawe/research/publications/wowmom2012.pdf


>>> retries of packets that the OS has given up on (including the user has 
>>> closed the app that sent them)
>>> 
>>> Ideally we want the wifi layer to be just like the wired layer, buffer only 
>>> what's needed to get it on the air without 'dead air' (where the driver is 
>>> waiting for the OS to give it more data), at that point, we can do the 
>>> retries from the OS as appropriate.
>>> 
>>>> I have two questions: 1) is my characterization roughly correct? 2) have 
>>>> people investigated the downsides (negative effect on TCP) of buffering 
>>>> *too little* in wireless equipment? (I suspect so?)  Finding where "too 
>>>> little" begins could give us a better idea of what the ideal buffer length 
>>>> should really be.
>>> 
>>> too little buffering will reduce the throughput as a result of unused 
>>> airtime.
>> 
>> so that's a function of, at least: 1) incoming traffic rate; 2) no. retries 
>> * ( f(MAC behavior; number of other senders trying) ).
> 
> incoming to the AP you mean?

incoming to whoever is sending and would be retrying - mostly the AP, yes.


> It also matters if you are worrying about aggregate throughput of a lot of 
> users, or per-connection throughput for a single user.
> 
> From a sender's point of view, if it takes 100 time units to send a packet, 
> and 1-5 time units to queue the next packet for transmission, you loose a few 
> percentage of your possible airtime and there's very little concern.
> 
> but if it takes 10 time units to send the packet and 1-5 time units to queue 
> the next packet, you have just lost a lot of potential bandwidth.
> 
> But from the point of view of the aggregate, these gaps just give someone 
> else a chance to transmit and have very little effect on the amount of 
> traffic arriving at the AP.
> 
> I was viewing things from the point of view of the app on the laptop.

Yes... I agree, and that's the more common + more reasonable way to think about 
it. I tend to think upstream, which of course is far less common, but maybe 
even more problematic. Actually I suspect the following: things get seriously 
bad when a lot of senders are sending upstream together; this isn't really 
happening much in practice - BUT when we have a very very large number of hosts 
connected in a conference style situation, all the HTTP GETs and SMTP messages 
and whatnot *do* create lots of collisions, a situation that isn't really too 
common (and maybe not envisioned / parametrized for), and that's why things 
often get so bad. (At least one of the reasons.)


>>> But at the low data rates involved, the system would have to be extremely 
>>> busy to be a significant amount of time if even one packet at a time is 
>>> buffered.
>> 
>> 
>> 
>>> You are also conflating the effect of the driver/hardware buffering with it 
>>> doing retries.
>> 
>> because of the "function" i wrote above: the more you retry, the more you 
>> need to buffer when traffic continuously arrives because you're stuck trying 
>> to send a frame again.
> 
> huh, I'm missing something here, retrying sends would require you to buffer 
> more when sending.

aren't you the saying the same thing as I ?  Sorry else, I might have expressed 
it confusingly somehow


> If people are retrying when they really don't need to, that cuts down on the 
> avialable airtime.

Yes


> But if you have continual transmissions taking place, so you have a hard time 
> getting a chance to send your traffic, then you really do have congestion and 
> should be dropping packets to let the sender know that it shouldn't try to 
> generate as much.

Yes; but the complexity that I was pointing at (but maybe it's a simple 
parameter, more like a 0 or 1 situation in practice?) lies in the word 
"continual". How long do you try before you decide that the sending TCP should 
really think it *is* congestion?  To really optimize the behavior, that would 
have to depend on the RTT, which you can't easily know.

Cheers,
Michael

_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] sigcomm wifi

Reply via email to