Hi Dave,

On Jan 17, 2014, at 00:12 , Dave Taht <dave.t...@gmail.com> wrote:

> 
> 
> 
> On Thu, Jan 16, 2014 at 5:56 PM, Sebastian Moeller <moell...@gmx.de> wrote:
> Hi Dave,
> 
> many thanks for all the information & elucidation, as always.
> 
> I enjoy trying to find the words to explain.
>  
> 
> On Jan 16, 2014, at 23:30 , Dave Taht <dave.t...@gmail.com> wrote:
> 
>> On Thu, Jan 16, 2014 at 10:29 AM, Sebastian Moeller <moell...@gmx.de> wrote:
>>> Hi Aaron,
>>> 
>>> On Jan 16, 2014, at 16:03 , Aaron Wood <wood...@gmail.com> wrote:
>>> 
>>>> All,
>>>> 
>>>> I'm noting this here in case anyone is interested.  After I write this up, 
>>>> I'm going to start from scratch on the configuration, and factory-reset 
>>>> the router.
>>>> 
>>>> =====
>>>> 
>>>> The 5GHz radio on my 3800 seems to be in a very odd state.  I'm not quire 
>>>> sure what state it's in, but it seems to be only doing HT20 1x1.  And in a 
>>>> fairly broken manner at that.
>>>> 
>>>> Running the rrul test (over wifi directly to the router as the netserver), 
>>>> tcp uploads were 25Mbps or so, but download was 5Mbps.
>>> 
>>>        This is with your mac? Try rrul_noclassification, macosx (at least 
>>> 10.8) will not do RRUL fair to a fast host. Why I do not know… it always 
>>> prioritizes the upload, as if it did not see/trust the downstream markings 
>>> (heck maybe it is busy using all bandwidth for upstream so that it 
>>> literally never sees the markings on the downstream packets..)
>> 
>> rrul with classification blows up 802.11e on all devices, everywhere.
>> The VO and VI queues generally get all the bandwidth.
>> Been saying that a while. VO and VI should be strictly admission
>> controlled and are not, anywhere. All the queues fill
>> and bad things happen. What should happen in a 802.11n world is that a
>> set of packets should wind up in the best queue for the TXOP, and VO
>> used not at all.
>> 
>> rrul_noclassification better looks like the intent for classification
>> was for 802.11e and thus works better. There are a couple
>> other tests in the netperf-wrapper suite that don't use classification
>> at all, that might be saner to use.
> 
>       Ah, so in rrul_noclassification, the UDP flows still are tos marked (at 
> least that is reported in the plots and visible in the plots), but even using 
> tcp_bidirectional I see a crazy imbalance 80:1, so this laptop's Broadcom 
> BCM43xx (apple is not as informative as I would like about the components, 
> but the firmware marker points at broadcom I would say) isn't better than the 
> intel wifi in your's I would say…
> 
> the iwl is a nightmare. the 802.11ac stuff is looking bad too.
> 
> Another issue with the current implementation of rrul is my intent with the 
> specification was to test voip-like streams, an
> isochronous 10ms packet in each direction. 
> 
> The implementation currently sends measurement flows based on the RTT, just 
> like ping. As the RTT declines in length, 
> the amount of "space" used up by the measurement flow gets bigger and bigger. 
> At a 3ms RTT, just the EF measurement
> flow eats ~2/3s of the available txops as it runs through the VO queue, which 
> is limited to a single packet per txop.

        So, how much data could one fit into a txop? Would it make sense for 
the driver to "pad" the VO txop with other data just to efficiently use the air 
bottleneck?

> The other measurement flows like the CS5 flow, eat the VI queue, and the BE 
> and BK queues get starved for tops.

        Ah so this is why I only see the TOS UDP data in the 
rrul_noclassification test, as they are otherwise crowded out by the tcp 
streams of same class, and nbot reported after the first drop...

> 
> I can barely explain to myself how the queues are supposed to get airtime 
> scheduled, see the 802.11e page on wikipedia. I thought 802.11e was a bad 
> idea in the first place... but what rrul does is try to get txops on all 4 
> queues, which means it
> needs 4x as much airtime (this is not accurate), and grabs airtime for it's 
> VO queue first most of the time, followed by
> VI, BE, and bk.
> 
> I think for wifi testing with the current rrul test there needs to be a new 
> test that does everything in BE. (toke?)
> Classification is very rarely used in the real world anyway.

        So that means the UDP streams as well?

> 
> Most of the usage of rrul to date has been over longer RTTs over ethernet... 
> (again, I'm delighted y'all are doing this,
> and I do hope to get a more voip-like test) 

        Yeah netperf-wrapper has been a delight in getting the ATM mess sorted 
out, great work. And now with the successor in the works things will get even 
better :)

>  
> 
> <tcp_bidirectional_hms-beagle_2_cerowrt.png>
> 
>> 
>> lastly, if you are doing a test over the internet, many providers pee
>> on the tos bits. Unless you've done a packet capture, you can't trust
>> that you are actually seeing classified packets coming back from the
>> internet.
> 
>       Good point, comparing just the local rrul plots with the ones to demo, 
> I see what you mean, there is a tiny bit of the priority classes visible in 
> the uplink (bur barely) and none at all in the downlink, so my ISP does not 
> think too much of the toe bits (I guess the tos effect on the uplink is from 
> what cero is doing and since cero controls the bottleneck some "imprint" 
> remains to be seen at packet reception time at demo, or so I think...).
> 
> simple.qos respects 3 of the 4 tiers that wifi does.
> 
> simplest does not.

        I know, even though I have no real use case I like the general idea of 
having dedicated bandwidth-limited channels for normal, important , and 
background traffic. Sort of just in case, belt and suspender kind of thinking.

>  
> 
>> 
>> One of the things I hope to fix with the twd effort is to detect tos
>> bit preservation and note it in the test.
>> 
>> I'm delighted you'all are seeing these results for yourselves. Getting
>> dinged on bandwidth after aiming for low latency by the public is not
>> something I'd wanted to happen with a "stable" release. Regrettably
>> fixing the drivers to work better only has
>> felix working on it in his spare time, and I've been trying to clear
>> my plate for months to help do the delicate rework
>> required. (or recruit others to help)
> 
>       I would love to help, but this is far out of my league and area of 
> expertise…
> 
> 
> yer helping plenty, and the more people that "get this", the sooner people 
> will work
> on fixing it. I have enjoyed trying to explain these behaviors today. Someday
> once we have words that match the concepts they will make sense to a CTO.
>  
> I have been very pleased by googling for bufferbloat of late. Almost everyone 
> that
> has talked about it on the web for the past month seems to get it.
> 
> So if we start now, and make this the year of "make-wifi-fast", in a couple 
> years
> maybe the world will get it...
> 
> ... sadly long after 802.11ac is fully deployed and messing up everything for
> everybody.

        "Make wifi fast" is a pretty good motto…

Best regards
        Sebastian


> best
>       Sebastian
> 
>> 
>> 
>>> About the other issue I do not know anything…
>>> 
>>> Best Regards
>>>        Sebastian
>>> 
>>>> This is me 1-2 meters from the router.  Load was never more than 0.33.  (I 
>>>> can share the results of people are interested).
>>>> 
>>>> After a full power cycle, wifi isn't coming up at all.
>>>> 
>>>> =====
>>>> 
>>>> How I got here:
>>>> 
>>>> 
>>>> I'm in France, and had dutifully set my unit with the FR country code when 
>>>> setting up CeroWRT.  I had noticed some odd latencies (periodic 100-200ms 
>>>> latency every 10-20 seconds over wifi) on the 5GHz network.  The router 
>>>> was on channel 36, and I wanted to move it up to the far-upper ranges, so 
>>>> I tried to specify a "custom" channel to do so (140).  This was the 
>>>> channel I thought I had been using with stock (Netgear) firmware.
>>>> 
>>>> Wifi didn't come back up after applying the changes, and the luci 
>>>> interface seemed to be tripping up over stuff that it was reading out of 
>>>> the configuration files.
>>>> 
>>>> I ssh'd in via ethernet, and fixed up the configurations by hand.
>>>> 
>>>> Except the driver is still reporting that the 5GHz network won't kick into 
>>>> 802.11n modes, and won't use HT40.  It seems to be sure it's configured 
>>>> for it, but isn't using it.
>>>> 
>>>> Further, digging into the rc_stats files with the minstrel speeds, I found 
>>>> some very odd data (not what I was expecting to see:
>>>> 
>>>> (laptop, which can do 2x2 HT40)
>>>> rate      throughput  ewma prob  this prob  this succ/attempt   success    
>>>> attempts
>>>>   D   6         6.0       99.9      100.0             2(  2)        65     
>>>>      65
>>>>       9         0.0        0.0        0.0             0(  0)         0     
>>>>       0
>>>>      12         2.9       25.0      100.0             0(  0)         1     
>>>>       1
>>>>      18         4.3       25.0      100.0             0(  0)         1     
>>>>       1
>>>>      24         5.6       25.0      100.0             0(  0)         1     
>>>>       1
>>>> A   P 36        32.4       99.9      100.0             0(  0)        51    
>>>>       51
>>>>  C   48        10.4       25.0      100.0             0(  0)         1     
>>>>       1
>>>> B    54        11.5       25.0      100.0             0(  0)         1     
>>>>       1
>>>> 
>>>> Total packet count::    ideal 53      lookaround 7
>>>> 
>>>> (AppleTV, 1x1 HT20)
>>>> root@cerowrt:/sys/kernel/debug/ieee80211/phy1/netdev:sw10# cat 
>>>> stations/58\:55\:ca\:51\:b5\:4b/rc_stats
>>>> rate      throughput  ewma prob  this prob  this succ/attempt   success    
>>>> attempts
>>>>       6         3.5       57.8      100.0             0(  0)         6     
>>>>       6
>>>>       9         3.9       43.7      100.0             0(  0)         2     
>>>>       2
>>>>      12         5.1       43.7      100.0             0(  0)         2     
>>>>       2
>>>>      18        10.0       57.8      100.0             0(  0)         3     
>>>>       3
>>>>   D  24        13.1       57.8      100.0             0(  0)         3     
>>>>       3
>>>>  C   36        14.2       43.7      100.0             0(  0)         2     
>>>>       2
>>>> B    48        18.2       43.7      100.0             0(  0)         2     
>>>>       2
>>>> A   P 54        46.2       99.9      100.0             1(  1)       348    
>>>>      367
>>>> 
>> 
>> No AMPDUs. Hmm. Might be a bug.
>> 
>>>> Total packet count::    ideal 331      lookaround 37
>> 
>> Hmm. The radios are set for HT20 for the 2.4ghz and HT40+ for the
>> 5ghz. I note that
>> HT40 in wireless-n the 8 channels used up need to be congruent.
>> 
>> HT40+ is 36+40, and 44+48 for example. You can't do 40+44.
>> 
>> Availability of HTXX is dependent upon your regulatory domain.
>> 
>>>> Whereas what I'm seeing for the 2.4GHz radio is:
>>>> 
>>>> root@cerowrt:/sys/kernel/debug/ieee80211/phy0/netdev:sw00/stations# cat 
>>>> 10\:9a\:dd\:30\:96\:34/rc_stats
>>>> type         rate     throughput  ewma prob   this prob  retry   this 
>>>> succ/attempt   success    attempts
>>>> CCK/LP        1.0M           0.7      100.0       100.0      0             
>>>>  0(  0)         2           2
>>>> CCK/SP        2.0M           0.0        0.0         0.0      0             
>>>>  0(  0)         0           0
>>>> CCK/SP        5.5M           0.0        0.0         0.0      0             
>>>>  0(  0)         0           0
>>>> CCK/SP       11.0M           0.0        0.0         0.0      0             
>>>>  0(  0)         0           0
>>>> HT20/LGI     MCS0            5.6      100.0       100.0      1             
>>>>  0(  0)         2           2
>>>> HT20/LGI     MCS1            0.0        0.0         0.0      0             
>>>>  0(  0)         0           0
>>>> HT20/LGI     MCS2            0.0        0.0         0.0      0             
>>>>  0(  0)         0           0
>>>> HT20/LGI     MCS3            0.0        0.0         0.0      0             
>>>>  0(  0)         0           0
>>>> HT20/LGI     MCS4            0.0        0.0         0.0      0             
>>>>  0(  0)         0           0
>>>> HT20/LGI     MCS5           30.3      100.0       100.0      5             
>>>>  0(  0)         1           1
>>>> HT20/LGI  t  MCS6           32.5      100.0       100.0      5             
>>>>  0(  0)        11          11
>>>> HT20/LGI T P MCS7           35.0      100.0       100.0      5             
>>>>  6(  6)        34          34
>>>> 
>>>> Total packet count::    ideal 45      lookaround 3
>>>> Average A-MPDU length: 1.3
>> 
>> You are doing good at the highest possible rate. However packet
>> aggregation is pretty terrible.
>> 
>>>> 
>>>> And here are radio blocks from the current /etc/config/wireless:
>>>> 
>>>> config wifi-device 'radio1'
>>>>      option type 'mac80211'
>>>>      option macaddr '28:c6:8e:bb:9a:49'
>>>>      list ht_capab 'SHORT-GI-40'
>>>>      list ht_capab 'TX-STBC'
>>>>      list ht_capab 'RX-STBC1'
>>>>      list ht_capab 'DSSS_CCK-40'
>>>>      option txpower '17'
>>>>      option distance '25'
>>>>      option channel '48'
>>>>      option country 'US'
>>>> 
>>>> config wifi-device 'radio0'
>>>>      option type 'mac80211'
>>>>      option hwmode '11ng'
>>>>      option macaddr '28:c6:8e:bb:9a:47'
>>>>      option htmode 'HT20'
>>>>      list ht_capab 'SHORT-GI-40'
>>>>      list ht_capab 'TX-STBC'
>>>>      list ht_capab 'RX-STBC1'
>>>>      list ht_capab 'DSSS_CCK-40'
>>>>      option txpower '26'
>>>>      option country 'FR'
>>>>      option distance '15'
>>>>      option channel 'auto'
>> 
>> I don't know anyone that has fiddled with distance to such an extent.
>> your country codes need to be the same and you should look at what
>> is allowed in FR.
>> 
>>>> ======
>>>> 
>>>> Some notes after having repaired the situation:
>>>> 
>>>> - The pci paths to the radios was missing from /etc/config/wireless, 
>>>> that's the only thing that I saw that seemed grossly out of place.
>>>> 
>>>> - Back up and running, and yes, it's much happier, now.  Over wifi I get 
>>>> 60-70Mbps upload and ~40Mbps download (running rrul).  Latency sucks.  
>>>> Wifi has some ugly bufferbloat.  (although these results are somewhat in 
>>>> question when the router has a 1m load average over 5.0...)
>> 
>> Trying to measure the one way delay here is important (and hard. The
>> only tool I've found for it so far was owamp, so I'm trying to write
>> that test in twd). A TON of your delay is coming from your client. A
>> network connection is like a fountain, or a toilet, both sides of the
>> flow count...
>> 
>>>> 
>>>> - Enabling all the SQM features I was having previously also considerably 
>>>> cleaned up wifi performance.  It's more balanced, but still not nearly as 
>>>> balanced as I see on gigabit ethernet.
>>>> 
>>>> 
>>>> 
>>>> -Aaron
>>>> _______________________________________________
>>>> Cerowrt-devel mailing list
>>>> Cerowrt-devel@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>>> 
>>> _______________________________________________
>>> Cerowrt-devel mailing list
>>> Cerowrt-devel@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> 
>> 
>> 
>> -- 
>> Dave Täht
>> 
>> Fixing bufferbloat with cerowrt: 
>> http://www.teklibre.com/cerowrt/subscribe.html
> 
> 
> 
> 
> -- 
> Dave Täht
> 
> Fixing bufferbloat with cerowrt: 
> http://www.teklibre.com/cerowrt/subscribe.html

_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Reply via email to