Hi Dave,
On Jan 17, 2014, at 00:12 , Dave Taht <dave.t...@gmail.com> wrote: > > > > On Thu, Jan 16, 2014 at 5:56 PM, Sebastian Moeller <moell...@gmx.de> wrote: > Hi Dave, > > many thanks for all the information & elucidation, as always. > > I enjoy trying to find the words to explain. > > > On Jan 16, 2014, at 23:30 , Dave Taht <dave.t...@gmail.com> wrote: > >> On Thu, Jan 16, 2014 at 10:29 AM, Sebastian Moeller <moell...@gmx.de> wrote: >>> Hi Aaron, >>> >>> On Jan 16, 2014, at 16:03 , Aaron Wood <wood...@gmail.com> wrote: >>> >>>> All, >>>> >>>> I'm noting this here in case anyone is interested. After I write this up, >>>> I'm going to start from scratch on the configuration, and factory-reset >>>> the router. >>>> >>>> ===== >>>> >>>> The 5GHz radio on my 3800 seems to be in a very odd state. I'm not quire >>>> sure what state it's in, but it seems to be only doing HT20 1x1. And in a >>>> fairly broken manner at that. >>>> >>>> Running the rrul test (over wifi directly to the router as the netserver), >>>> tcp uploads were 25Mbps or so, but download was 5Mbps. >>> >>> This is with your mac? Try rrul_noclassification, macosx (at least >>> 10.8) will not do RRUL fair to a fast host. Why I do not know… it always >>> prioritizes the upload, as if it did not see/trust the downstream markings >>> (heck maybe it is busy using all bandwidth for upstream so that it >>> literally never sees the markings on the downstream packets..) >> >> rrul with classification blows up 802.11e on all devices, everywhere. >> The VO and VI queues generally get all the bandwidth. >> Been saying that a while. VO and VI should be strictly admission >> controlled and are not, anywhere. All the queues fill >> and bad things happen. What should happen in a 802.11n world is that a >> set of packets should wind up in the best queue for the TXOP, and VO >> used not at all. >> >> rrul_noclassification better looks like the intent for classification >> was for 802.11e and thus works better. There are a couple >> other tests in the netperf-wrapper suite that don't use classification >> at all, that might be saner to use. > > Ah, so in rrul_noclassification, the UDP flows still are tos marked (at > least that is reported in the plots and visible in the plots), but even using > tcp_bidirectional I see a crazy imbalance 80:1, so this laptop's Broadcom > BCM43xx (apple is not as informative as I would like about the components, > but the firmware marker points at broadcom I would say) isn't better than the > intel wifi in your's I would say… > > the iwl is a nightmare. the 802.11ac stuff is looking bad too. > > Another issue with the current implementation of rrul is my intent with the > specification was to test voip-like streams, an > isochronous 10ms packet in each direction. > > The implementation currently sends measurement flows based on the RTT, just > like ping. As the RTT declines in length, > the amount of "space" used up by the measurement flow gets bigger and bigger. > At a 3ms RTT, just the EF measurement > flow eats ~2/3s of the available txops as it runs through the VO queue, which > is limited to a single packet per txop. So, how much data could one fit into a txop? Would it make sense for the driver to "pad" the VO txop with other data just to efficiently use the air bottleneck? > The other measurement flows like the CS5 flow, eat the VI queue, and the BE > and BK queues get starved for tops. Ah so this is why I only see the TOS UDP data in the rrul_noclassification test, as they are otherwise crowded out by the tcp streams of same class, and nbot reported after the first drop... > > I can barely explain to myself how the queues are supposed to get airtime > scheduled, see the 802.11e page on wikipedia. I thought 802.11e was a bad > idea in the first place... but what rrul does is try to get txops on all 4 > queues, which means it > needs 4x as much airtime (this is not accurate), and grabs airtime for it's > VO queue first most of the time, followed by > VI, BE, and bk. > > I think for wifi testing with the current rrul test there needs to be a new > test that does everything in BE. (toke?) > Classification is very rarely used in the real world anyway. So that means the UDP streams as well? > > Most of the usage of rrul to date has been over longer RTTs over ethernet... > (again, I'm delighted y'all are doing this, > and I do hope to get a more voip-like test) Yeah netperf-wrapper has been a delight in getting the ATM mess sorted out, great work. And now with the successor in the works things will get even better :) > > > <tcp_bidirectional_hms-beagle_2_cerowrt.png> > >> >> lastly, if you are doing a test over the internet, many providers pee >> on the tos bits. Unless you've done a packet capture, you can't trust >> that you are actually seeing classified packets coming back from the >> internet. > > Good point, comparing just the local rrul plots with the ones to demo, > I see what you mean, there is a tiny bit of the priority classes visible in > the uplink (bur barely) and none at all in the downlink, so my ISP does not > think too much of the toe bits (I guess the tos effect on the uplink is from > what cero is doing and since cero controls the bottleneck some "imprint" > remains to be seen at packet reception time at demo, or so I think...). > > simple.qos respects 3 of the 4 tiers that wifi does. > > simplest does not. I know, even though I have no real use case I like the general idea of having dedicated bandwidth-limited channels for normal, important , and background traffic. Sort of just in case, belt and suspender kind of thinking. > > >> >> One of the things I hope to fix with the twd effort is to detect tos >> bit preservation and note it in the test. >> >> I'm delighted you'all are seeing these results for yourselves. Getting >> dinged on bandwidth after aiming for low latency by the public is not >> something I'd wanted to happen with a "stable" release. Regrettably >> fixing the drivers to work better only has >> felix working on it in his spare time, and I've been trying to clear >> my plate for months to help do the delicate rework >> required. (or recruit others to help) > > I would love to help, but this is far out of my league and area of > expertise… > > > yer helping plenty, and the more people that "get this", the sooner people > will work > on fixing it. I have enjoyed trying to explain these behaviors today. Someday > once we have words that match the concepts they will make sense to a CTO. > > I have been very pleased by googling for bufferbloat of late. Almost everyone > that > has talked about it on the web for the past month seems to get it. > > So if we start now, and make this the year of "make-wifi-fast", in a couple > years > maybe the world will get it... > > ... sadly long after 802.11ac is fully deployed and messing up everything for > everybody. "Make wifi fast" is a pretty good motto… Best regards Sebastian > best > Sebastian > >> >> >>> About the other issue I do not know anything… >>> >>> Best Regards >>> Sebastian >>> >>>> This is me 1-2 meters from the router. Load was never more than 0.33. (I >>>> can share the results of people are interested). >>>> >>>> After a full power cycle, wifi isn't coming up at all. >>>> >>>> ===== >>>> >>>> How I got here: >>>> >>>> >>>> I'm in France, and had dutifully set my unit with the FR country code when >>>> setting up CeroWRT. I had noticed some odd latencies (periodic 100-200ms >>>> latency every 10-20 seconds over wifi) on the 5GHz network. The router >>>> was on channel 36, and I wanted to move it up to the far-upper ranges, so >>>> I tried to specify a "custom" channel to do so (140). This was the >>>> channel I thought I had been using with stock (Netgear) firmware. >>>> >>>> Wifi didn't come back up after applying the changes, and the luci >>>> interface seemed to be tripping up over stuff that it was reading out of >>>> the configuration files. >>>> >>>> I ssh'd in via ethernet, and fixed up the configurations by hand. >>>> >>>> Except the driver is still reporting that the 5GHz network won't kick into >>>> 802.11n modes, and won't use HT40. It seems to be sure it's configured >>>> for it, but isn't using it. >>>> >>>> Further, digging into the rc_stats files with the minstrel speeds, I found >>>> some very odd data (not what I was expecting to see: >>>> >>>> (laptop, which can do 2x2 HT40) >>>> rate throughput ewma prob this prob this succ/attempt success >>>> attempts >>>> D 6 6.0 99.9 100.0 2( 2) 65 >>>> 65 >>>> 9 0.0 0.0 0.0 0( 0) 0 >>>> 0 >>>> 12 2.9 25.0 100.0 0( 0) 1 >>>> 1 >>>> 18 4.3 25.0 100.0 0( 0) 1 >>>> 1 >>>> 24 5.6 25.0 100.0 0( 0) 1 >>>> 1 >>>> A P 36 32.4 99.9 100.0 0( 0) 51 >>>> 51 >>>> C 48 10.4 25.0 100.0 0( 0) 1 >>>> 1 >>>> B 54 11.5 25.0 100.0 0( 0) 1 >>>> 1 >>>> >>>> Total packet count:: ideal 53 lookaround 7 >>>> >>>> (AppleTV, 1x1 HT20) >>>> root@cerowrt:/sys/kernel/debug/ieee80211/phy1/netdev:sw10# cat >>>> stations/58\:55\:ca\:51\:b5\:4b/rc_stats >>>> rate throughput ewma prob this prob this succ/attempt success >>>> attempts >>>> 6 3.5 57.8 100.0 0( 0) 6 >>>> 6 >>>> 9 3.9 43.7 100.0 0( 0) 2 >>>> 2 >>>> 12 5.1 43.7 100.0 0( 0) 2 >>>> 2 >>>> 18 10.0 57.8 100.0 0( 0) 3 >>>> 3 >>>> D 24 13.1 57.8 100.0 0( 0) 3 >>>> 3 >>>> C 36 14.2 43.7 100.0 0( 0) 2 >>>> 2 >>>> B 48 18.2 43.7 100.0 0( 0) 2 >>>> 2 >>>> A P 54 46.2 99.9 100.0 1( 1) 348 >>>> 367 >>>> >> >> No AMPDUs. Hmm. Might be a bug. >> >>>> Total packet count:: ideal 331 lookaround 37 >> >> Hmm. The radios are set for HT20 for the 2.4ghz and HT40+ for the >> 5ghz. I note that >> HT40 in wireless-n the 8 channels used up need to be congruent. >> >> HT40+ is 36+40, and 44+48 for example. You can't do 40+44. >> >> Availability of HTXX is dependent upon your regulatory domain. >> >>>> Whereas what I'm seeing for the 2.4GHz radio is: >>>> >>>> root@cerowrt:/sys/kernel/debug/ieee80211/phy0/netdev:sw00/stations# cat >>>> 10\:9a\:dd\:30\:96\:34/rc_stats >>>> type rate throughput ewma prob this prob retry this >>>> succ/attempt success attempts >>>> CCK/LP 1.0M 0.7 100.0 100.0 0 >>>> 0( 0) 2 2 >>>> CCK/SP 2.0M 0.0 0.0 0.0 0 >>>> 0( 0) 0 0 >>>> CCK/SP 5.5M 0.0 0.0 0.0 0 >>>> 0( 0) 0 0 >>>> CCK/SP 11.0M 0.0 0.0 0.0 0 >>>> 0( 0) 0 0 >>>> HT20/LGI MCS0 5.6 100.0 100.0 1 >>>> 0( 0) 2 2 >>>> HT20/LGI MCS1 0.0 0.0 0.0 0 >>>> 0( 0) 0 0 >>>> HT20/LGI MCS2 0.0 0.0 0.0 0 >>>> 0( 0) 0 0 >>>> HT20/LGI MCS3 0.0 0.0 0.0 0 >>>> 0( 0) 0 0 >>>> HT20/LGI MCS4 0.0 0.0 0.0 0 >>>> 0( 0) 0 0 >>>> HT20/LGI MCS5 30.3 100.0 100.0 5 >>>> 0( 0) 1 1 >>>> HT20/LGI t MCS6 32.5 100.0 100.0 5 >>>> 0( 0) 11 11 >>>> HT20/LGI T P MCS7 35.0 100.0 100.0 5 >>>> 6( 6) 34 34 >>>> >>>> Total packet count:: ideal 45 lookaround 3 >>>> Average A-MPDU length: 1.3 >> >> You are doing good at the highest possible rate. However packet >> aggregation is pretty terrible. >> >>>> >>>> And here are radio blocks from the current /etc/config/wireless: >>>> >>>> config wifi-device 'radio1' >>>> option type 'mac80211' >>>> option macaddr '28:c6:8e:bb:9a:49' >>>> list ht_capab 'SHORT-GI-40' >>>> list ht_capab 'TX-STBC' >>>> list ht_capab 'RX-STBC1' >>>> list ht_capab 'DSSS_CCK-40' >>>> option txpower '17' >>>> option distance '25' >>>> option channel '48' >>>> option country 'US' >>>> >>>> config wifi-device 'radio0' >>>> option type 'mac80211' >>>> option hwmode '11ng' >>>> option macaddr '28:c6:8e:bb:9a:47' >>>> option htmode 'HT20' >>>> list ht_capab 'SHORT-GI-40' >>>> list ht_capab 'TX-STBC' >>>> list ht_capab 'RX-STBC1' >>>> list ht_capab 'DSSS_CCK-40' >>>> option txpower '26' >>>> option country 'FR' >>>> option distance '15' >>>> option channel 'auto' >> >> I don't know anyone that has fiddled with distance to such an extent. >> your country codes need to be the same and you should look at what >> is allowed in FR. >> >>>> ====== >>>> >>>> Some notes after having repaired the situation: >>>> >>>> - The pci paths to the radios was missing from /etc/config/wireless, >>>> that's the only thing that I saw that seemed grossly out of place. >>>> >>>> - Back up and running, and yes, it's much happier, now. Over wifi I get >>>> 60-70Mbps upload and ~40Mbps download (running rrul). Latency sucks. >>>> Wifi has some ugly bufferbloat. (although these results are somewhat in >>>> question when the router has a 1m load average over 5.0...) >> >> Trying to measure the one way delay here is important (and hard. The >> only tool I've found for it so far was owamp, so I'm trying to write >> that test in twd). A TON of your delay is coming from your client. A >> network connection is like a fountain, or a toilet, both sides of the >> flow count... >> >>>> >>>> - Enabling all the SQM features I was having previously also considerably >>>> cleaned up wifi performance. It's more balanced, but still not nearly as >>>> balanced as I see on gigabit ethernet. >>>> >>>> >>>> >>>> -Aaron >>>> _______________________________________________ >>>> Cerowrt-devel mailing list >>>> Cerowrt-devel@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>> >>> _______________________________________________ >>> Cerowrt-devel mailing list >>> Cerowrt-devel@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >> >> >> >> -- >> Dave Täht >> >> Fixing bufferbloat with cerowrt: >> http://www.teklibre.com/cerowrt/subscribe.html > > > > > -- > Dave Täht > > Fixing bufferbloat with cerowrt: > http://www.teklibre.com/cerowrt/subscribe.html _______________________________________________ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel