[removing Lars and Jim from direct cc, don't want to spam them - I don't know if they're sooo interested in this thread?]
On 23. aug. 2014, at 01:50, David Lang <[email protected]> wrote: > On Sat, 23 Aug 2014, Michael Welzl wrote: > >> On 21. aug. 2014, at 10:30, David Lang <[email protected]> wrote: >> >>> On Thu, 21 Aug 2014, Michael Welzl wrote: >>> >>>> On 21. aug. 2014, at 08:52, Eggert, Lars wrote: >>>> >>>>> On 2014-8-21, at 0:05, Jim Gettys <[email protected]> wrote: >>>>>> And what kinds of AP's? All the 1G guarantees you is that your >>>>>> bottleneck is in the wifi hop, and they can suffer as badly as anything >>>>>> else (particularly consumer home routers). >>>>>> The reason why 802.11 works ok at IETF and NANOG is that: >>>>>> o) they use Cisco enterprise AP's, which are not badly over buffered. >>>> >>>> I'd like to better understand this particular bloat problem: >>>> >>>> 100s of senders try to send at the same time. They can't all do that, so >>>> their cards retry a fixed number of times (10 or something, I don't >>>> remember, probably configurable), for which they need to have a buffer. >>>> >>>> Say, the buffer is too big. Say, we make it smaller. Then an 802.11 sender >>>> trying to get its time slot in a crowded network will have to drop a >>>> packet, requiring the TCP sender to retransmit the packet instead. The TCP >>>> sender will think it's congestion (not entirely wrong) and reduce its >>>> window (not entirely wrong either). How appropriate TCP's cwnd reduction >>>> is probably depends on how "true" the notion of congestion is ... i.e. if >>>> I can buffer only one packet and just don't get to send it, or it gets a >>>> CRC error ("collides" in the air), then that can be seen as a pure matter >>>> of luck. Then I provoke a sender reaction that's like the old story of TCP >>>> mis-interpreting random losses as a sign of congestion. I think in most >>>> practical systems this old story is now a myth because wireless equipment >>>> will try to buffer data for a relatively long time instead of exhibiting >>>> sporadic random drops to upper layers. That is, in principle, a good thing >>>> - but buffering too much has of c! > ourse all the problems that we know.. Not an easy trade-off at all I think. >>> >>> in this case the loss is a direct sign of congestion. >> >> "this case" - I talk about different buffer lengths. E.g., take the minimal >> buffer that would just function, and set retransmissions to 0. Then, a >> packet loss is a pretty random matter - just because you and I contended, >> doesn't mean that the net is truly "overloaded" ? So my point is that the >> buffer creates a continuum from "random loss" to "actual congestion" - we >> want loss to mean "actual congestion", but how large should it be to >> meaningfully convey that? >> >> >>> remember that TCP was developed back in the days of 10base2 networks where >>> everyone on the network was sharing a wire and it was very possible for >>> multiple senders to start transmitting on the wire at the same time, just >>> like with radio. >> >> cable or wireless: is one such occurrence "congestion"? >> i.e. is halving the cwnd really the right response to that sort of >> "congestion"? (contention, really) > > possibly not, but in practice it may be 'good enough' > > but to make it work well, you probably want to play games with how much you > back off, and how quickly you retry if you don't get a response. > > The fact that the radio link can have it's own ack for the packet can > actually be an improvement over doing it at the TCP level as it only need to > ack/retry for that hop, and if that hop was good, there's far less of a need > to retry if the server is just slow. Yep... I remember a neat paper from colleagues at Trento University that piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions between TCP's ACKs and other data packets - really nice. Not sure if it wasn't just simulations, though. > so if we try and do the retries in the OS stack, it will need to know the > difference between "failed to get out the first hop due to collision" and > "got out the first hop, waiting for the server across the globe to respond" > with different timeouts/retries for them. > >>> A large part of the problem with high-density wifi is that it just wasn't >>> designed for that sort of environment, and there are a lot of things that >>> it does that work great for low-density, weak signal environments, but just >>> make the problem worse for high-density environements >>> >>> batching packets together >>> slowing down the transmit speed if you aren't getting through >> >> well... this *should* only happen when there's an actual physical signal >> quality degradation, not just collisions. at least minstrel does quite a >> good job at ensuring that, most of the time. > > "should" :-) > > but can the firmware really tell the difference between quality degredation > due to interference and collisions with other transmitters? Well, with heuristics it can, sort of. As a simple example from one older mechanism, consider: multiple consecutive losses are *less* likely from random collisions than from link noise. That sort of thing. Minstrel worked best our tests, using tables of rates that worked well / didn't work well in the past: http://heim.ifi.uio.no/michawe/research/publications/wowmom2012.pdf >>> retries of packets that the OS has given up on (including the user has >>> closed the app that sent them) >>> >>> Ideally we want the wifi layer to be just like the wired layer, buffer only >>> what's needed to get it on the air without 'dead air' (where the driver is >>> waiting for the OS to give it more data), at that point, we can do the >>> retries from the OS as appropriate. >>> >>>> I have two questions: 1) is my characterization roughly correct? 2) have >>>> people investigated the downsides (negative effect on TCP) of buffering >>>> *too little* in wireless equipment? (I suspect so?) Finding where "too >>>> little" begins could give us a better idea of what the ideal buffer length >>>> should really be. >>> >>> too little buffering will reduce the throughput as a result of unused >>> airtime. >> >> so that's a function of, at least: 1) incoming traffic rate; 2) no. retries >> * ( f(MAC behavior; number of other senders trying) ). > > incoming to the AP you mean? incoming to whoever is sending and would be retrying - mostly the AP, yes. > It also matters if you are worrying about aggregate throughput of a lot of > users, or per-connection throughput for a single user. > > From a sender's point of view, if it takes 100 time units to send a packet, > and 1-5 time units to queue the next packet for transmission, you loose a few > percentage of your possible airtime and there's very little concern. > > but if it takes 10 time units to send the packet and 1-5 time units to queue > the next packet, you have just lost a lot of potential bandwidth. > > But from the point of view of the aggregate, these gaps just give someone > else a chance to transmit and have very little effect on the amount of > traffic arriving at the AP. > > I was viewing things from the point of view of the app on the laptop. Yes... I agree, and that's the more common + more reasonable way to think about it. I tend to think upstream, which of course is far less common, but maybe even more problematic. Actually I suspect the following: things get seriously bad when a lot of senders are sending upstream together; this isn't really happening much in practice - BUT when we have a very very large number of hosts connected in a conference style situation, all the HTTP GETs and SMTP messages and whatnot *do* create lots of collisions, a situation that isn't really too common (and maybe not envisioned / parametrized for), and that's why things often get so bad. (At least one of the reasons.) >>> But at the low data rates involved, the system would have to be extremely >>> busy to be a significant amount of time if even one packet at a time is >>> buffered. >> >> >> >>> You are also conflating the effect of the driver/hardware buffering with it >>> doing retries. >> >> because of the "function" i wrote above: the more you retry, the more you >> need to buffer when traffic continuously arrives because you're stuck trying >> to send a frame again. > > huh, I'm missing something here, retrying sends would require you to buffer > more when sending. aren't you the saying the same thing as I ? Sorry else, I might have expressed it confusingly somehow > If people are retrying when they really don't need to, that cuts down on the > avialable airtime. Yes > But if you have continual transmissions taking place, so you have a hard time > getting a chance to send your traffic, then you really do have congestion and > should be dropping packets to let the sender know that it shouldn't try to > generate as much. Yes; but the complexity that I was pointing at (but maybe it's a simple parameter, more like a 0 or 1 situation in practice?) lies in the word "continual". How long do you try before you decide that the sending TCP should really think it *is* congestion? To really optimize the behavior, that would have to depend on the RTT, which you can't easily know. Cheers, Michael
_______________________________________________ Bloat mailing list [email protected] https://lists.bufferbloat.net/listinfo/bloat
