From: Bob McMahon via Bloat <bl...@lists.bufferbloat.net> > Date: Wed,2021-07-14 at 11:38 AM > One challenge I faced with iperf 2 was around flow control's effects on > latency. I find if iperf 2 rate limits on writes then the end/end > latencies, RTT look good because the pipe is basically empty, while rate > limiting reads to the same value fills the window and drives the RTT up. > One might conclude, from a network perspective, the write side is > better. But in reality, the write rate limiting is just pushing the > delay into the application's logic, i.e. the relevant bytes may not be > in the pipe but they aren't at the receiver either, they're stuck > somewhere in the "tx application space." > > It wasn't obvious to me how to address this. We added burst measurements > (burst xfer time, and bursts/sec) which, I think, helps. ... >>> I find the assumption that congestion occurs "in network" as not always >>> true. Taking OWD measurements with read side rate limiting suggests that >>> equally important to mitigating bufferbloat driven latency using congestion >>> signals is to make sure apps read "fast enough" whatever that means. I >>> rarely hear about how important it is for apps to prioritize reads over >>> open sockets. Not sure why that's overlooked and bufferbloat gets all the >>> attention. I'm probably missing something.
Hi Bob, You're right that the sender generally also has to avoid sending more than the receiver can handle to avoid delays in a message- reply cycle on the same TCP flow. In general, I think of failures here as application faults rather than network faults. While important for user experience, it's something that an app developer can solve. That's importantly different from network buffering. It's also somewhat possible to avoid getting excessively backed up in the network because of your own traffic. Here bbr usually does a decent job of keeping the queues decently low. (And you'll maybe find that some of the bufferbloat measurement efforts are relying on the self-congestion you get out of cubic, so if you switch them to bbr you might not get a good answer on how big the network buffers are.) In general, anything along these lines has to give backpressure to the sender somehow. What I'm guessing you saw when you did receiver- side rate limiting was that the backpressure had to fill bytes all the way back to a full receive kernel buffer (making a 0 rwnd for TCP) and a full send kernel buffer before the send writes start failing (I think with ENOBUFS iirc?), and that's the first hint the sender has that it can't send more data right now. The assumption that the receiver can receive as fast as the sender can send is so common that it often goes unstated. (If you love to suffer, you can maybe get the backpressure to start earlier, and with maybe a lower impact to your app-level RTT, if you try hard enough from the receive side with TCP_WINDOW_CLAMP: https://man7.org/linux/man-pages/man7/tcp.7.html#:~:text=tcp_window_clamp But you'll still be living with a full send buffer ahead of the message-response.) But usually the right thing to do if you want receiver-driven rate control is to send back some kind of explicit "slow down, it's too fast for me" feedback at the app layer that will make the sender send slower. For instance most ABR players will shift down their bitrate if they're failing to render video fast enough just as well as if the network isn't feeding the video segments fast enough, like if they're CPU-bound from something else churning on the machine. (RTP-based video players are supposed to send feedback with this same kind of "slow down" capability, and sometimes they do.) But what you can't fix from the endpoints no matter how hard you try is the buffers in the network that get filled by other people's traffic. Getting other people's traffic to avoid breaking my latency when we're sharing a bottleneck requires deploying something in the network and it's not something I can fix myself except inside my own network. While the app-specific fixes would make for very fine blog posts or stack overflow questions that could help someone who managed to search the right terms, there's a lot of different approaches for different apps that can solve it more or less, and anyone who tries hard enough will land on something that works well enough for them, and you don't need a whole movement to get people to make it so their own app works ok for them and their users. The problems can be subtle and maybe there will be some late and frustrating nights involved, but anyone who gets it reproducible and keeps digging will solve it eventually. But getting stuff deployed in networks to stop people's traffic breaking each other's latency is harder, especially when it's a major challenge for people to even grasp the problem and understand its causes. The only possible paths to getting a solution widely deployed (assuming you have one that works) start with things like "start an advocacy movement" or "get a controlling interest in Cisco". Best, Jake _______________________________________________ Cake mailing list Cake@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cake