Adding in the BBR list. (I have no problem with this discussion moving entirely to the BBR list...)
The flent dataset we're looking at is here: http://blog.cerowrt.org/flent/bbr-comprehensive.tgz # this might get bigger, I have pfifo tests running now My description of what I was doing and what I observed, started here: https://lists.bufferbloat.net/pipermail/cerowrt-devel/2016-September/010858.html where I basically setup a 48ms 20Mbit internet emulation and hit it with every combination of cubic,bbr, with and without ecn, against all forms of queue management (bfifo, pie, fq-codel, cake (with and without fq)) On Wed, Sep 21, 2016 at 5:40 AM, Mikael Abrahamsson <swm...@swm.pp.se> wrote: > On Wed, 21 Sep 2016, Dave Taht wrote: > >> * It seriously outcompetes cubic, particularly on the single queue aqms. >> fq_codel is fine. I need to take apart the captures to see how well it is >> behaving in this case. My general hope was that with fq in place, anything >> that was delay based worked better as it was only competing against itself. > > > I'm looking at 4up-sqwave-fq_bfifo-256k. Is this really fq_bfifo, or just > bfifo? Looks like there is no fq. The structure of the test naming is srcqdisc-bottleneckqdisc-otherparameters, so this was coming from sch_fq, going through bfifo set for 256k in either direction. Sorry for the confusion. > If someone doesn't have the correct Flent available, I posted two > screenshots here: http://imgur.com/a/cFtMd You can get a bit more detail by zooming in on the plot via the controls. The spike at the end is an artifact we sometimes get from the dataset, which messes up the auto-scaling. We also tend to kick into a log scale automatically more often than I'd like, you can disable log scaling via the menu. Also, you can save plots in any format (no need for a screen shot), and doing comparison plots is easy with the file browser or data->add other open files. I'm perpetually taking a slice at the problem with flent-gui *noecn*.gz or flent-gui *{cake,pie)*-noecn*.gz before resorting to the file browser to combine plots more intelligently. Thx for taking a look and posting your pics! > What I think I see: > > The flows are started in order: "BBR1, CUBIC2, BBR4, CUBIC3" (a bit > confusing, but according to your description). Your interpretation of the sequencing is correct. BBR, 3 seconds later, cubic2, 3 seconds later BBR4, 3 seconds later cubic3, all tests run for 60 seconds each. I will change the labling (and add more plot types) after the coffee kicks into my system. Other suggestions for structuring an A/B test like this welcomed - prior to this I'd had something similar that started things on 5 second intervals. That "beat" against BBR's 10 second interval, so I went with 3 second intervals alternating the cc types. I could switch to primes? I could use a better name - it's not a "square wave" - for the "tcp_4up_squarewave" test itself, also. > So it looks like BBR1 fills the pipe within half a second or so, nice steady > state. Then CUBIC2 starts, and slowly over a few seconds, starts to starve > BBR1 of BW, it looks like steady state here would be that CUBIC2 would end > up with around 65-70% of the BW, and BBR1 getting 30-35%. I can't draw that conclusion on 3 seconds worth of competing data on drop tail, so a longer test of staggering the start for the two different cc's is needed. >Then BBR4 comes > along (10 seconds in), and just KILLS them both, smacks them over the head > with a hammer, taking 90% of the BW, wildly oscillating in BW way above 20b > megabit/s down to 10. Latecomer advantage. And it's not "oscillating above 20mbits" per se' - that's impossible - there's packet loss, so what flent reports is the "delivered" throughput, which oscillates as a function of losses filled in.. I was sampling on a 50ms interval here, which is as low as flent can go, and close to the RTT. Were I sampling at flent's higher default (200ms) intervals, or running at a lower RTT, less "occilations" would have been apparent. I have been bit *badly* lately by sampling at rates above what nyquist would recommend, and here I would have preferred to have sampled at 1/2 the baseline RTT. This would be a lot easier to see from a capture, looking at sacks, and/or cwnd. > The ping here goes up to around 150-160ms. CUBIC3 single queues suck > starts at 15 seconds and get basically no bw at all. > > Then at around 22 seconds in, I guess pretty close to 12-13 seconds after > BBR4 was started, BBR4 starts to calm down, slowly letting the other streams > come back to life. At around 30 seconds, they all seem to get at least a bit > of the bw each and nobody is completely starved, but BBR1 seems to not get > much BW at all (very dotted line). T+28 looks to me as though all flows have got close to their fair share (this is quite a long path it would take ages to see the real long term behavior), and after flows start to die off (60 seconds each, offset by their starting delay), stuff seems to grab back what they should more or less. I can grow the test from 60 to 600 seconds and stagger the starts out that way to grow it... or shorten the RTT... or buffer size. I did a string of tests against pfifo_1000 while I was asleep, haven't looked yet. > When at the end there is only CUBIC3 and BBR4 left, it looks like BBR4 has a > 2/3 to 1/3 advantage. Too little detail to tell. Possibly. But maybe not, after another 10sec. > Looking at cake_flowblind_noecn, BBR1 and BBR4 just kills both CUBIC flows. > Same with PIE. Yep. The single queue AQMs expect their induced drops to matter to all flows. BBR disregards them as noise. I think there's hope though, if BBR can treat ECN CE as a clear indication of of congestion and not filter it as it does drops. But cake/fq_codel is just fine with different cc's in the mix, and I'm dying to look at the captures for what happens there. > So it seems my intuition was wrong, at least for these scenarios. It wasn't > CUBIC that would kill BBR, it's the other way around. My intuition was that "delay based TCPs can't work on the internet!" - and was wrong, also. > Great to have testing > tools! Thanks Flent! Thx, toke! I try not to remember just how hard it was to do this sort of analysis on complex network flows when we started. > > -- > Mikael Abrahamsson email: swm...@swm.pp.se -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org _______________________________________________ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel