Re: [Cerowrt-devel] some kernel updates

Fred Stratton Fri, 23 Aug 2013 06:03:30 -0700

On 23 Aug 2013, at 13:37, Sebastian Moeller <[email protected]> wrote:


> Hi Jesper,
> 
> thanks for your time…
> 
> On Aug 23, 2013, at 13:16 , Jesper Dangaard Brouer <[email protected]> wrote:
> 
>> On Fri, 23 Aug 2013 12:15:12 +0200
>> Sebastian Moeller <[email protected]> wrote:
>> 
>>> Hi Jesper,
>>> 
>>> 
>>> On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer <[email protected]> 
>>> wrote:
>>> 
>>>> On Thu, 22 Aug 2013 22:13:52 -0700
>>>> Dave Taht <[email protected]> wrote:
>>>> 
>>>>> On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>> Hi List, hi Jesper,
>>>>>> 
>>>>>> So I tested 3.10.9-1 to assess the status of the HTB atm link layer
>>>>>> adjustments to see whether the recent changes resurrected this feature.
>>>>>>      Unfortunately the htb_private link layer adjustments still is
>>>>>> broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, 
>>>>>> same
>>>>>> as without link layer adjustments). On the bright side the tc_stab method
>>>>>> still works as well as before (ping RTT around 40ms).
>>>>>>      I would like to humbly propose to use the tc stab method in
>>>>>> cerowrt to perform ATM link layer adjustments as default. To repeat 
>>>>>> myself,
>>>>>> simply telling the kernel a lie about the packet size seems more robust
>>>>>> than fudging HTB's rate tables.
>>>> 
>>>> After the (regression) commit 56b765b79 ("htb: improved accuracy at
>>>> high rates"), the kernel no-longer uses the rate tables.  
>>> 
>>>     See, I am quite a layman here, spelunking through the tc and kernel 
>>> source code made me believe that the rate tables are still used (I might 
>>> have looked at too old versions of both repositories though).
>>> 
>>>> 
>>>> My commit 8a8e3d84b1719 (net_sched: restore "linklayer atm" handling),
>>>> does the ATM cell overhead calculation directly on the packet length,
>>>> see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53).
>>>> Thus, the cell calc should actually be more precise now.... but see below
>>> 
>>>     Is there any way to make HTB report which link layer it assumes?
>> 
>> I added some print debug statements in my patch, so you can see this in
>> the kernel log / dmesg.  Run activate the debugging print statments:
>> 
>> mount -t debugfs none /sys/kernel/debug/
>> echo "func __detect_linklayer +p"
>>> /sys/kernel/debug/dynamic_debug/control
>> 
>> Run your tc script, and run dmesg, or look in kernel-syslog.
> 
>       Ah, unfortunately I am not setup to build new kernels for the router I 
> am testing on, so I would hereby like to beg Dave to include that patch in 
> one of the next releases. Would it not a good idea to teach tc to report the 
> link layer for HTB as it does for stab? Having to empirically figure out 
> whether it is applied or not is somewhat cumbersome...
> 
> 
>> 
>> 
>>>> 
>>>>>> Especially since the kernel already fudges
>>>>>> the packet size to account for the ethernet header and then some, so this
>>>>>> path should receive more scrutiny by virtue of having more users?
>>>> 
>>>> As you mention, the default kernel path (not tc stab) fudges the packet
>>>> size for Ethernet headers, AND I made a mistake (back in approx 2006,
>>>> sorry) that the "overhead" cannot be a negative number.  
>>> 
>>>     Mmh, does this also apply to stab?
>> 
>> This seems to be two question...
>> 
>> Yes, the Ethernet header size gets adjusted/added before the "stab"
>> call.
>> For reference
>> See: net/core/dev.c function __dev_xmit_skb()
>> Call qdisc_pkt_len_init(skb); // adjust Ethernet and account for GSO
>> Call qdisc_calculate_pkt_len(skb, q); // is the stab call
>> (ps calls __qdisc_calculate_pkt_len() in net/sched/sch_api.c)
>> 
>> The qdisc_pkt_len_init() call were introduced by Eric in
>> v3.9-rc1~139^2~411.
> 
>       So I look at 3.10 here:
> 
> net/core/dev.c, qdisc_pkt_len_init
> line 2628:    qdisc_skb_cb(skb)->pkt_len = skb->len;
> and in 
> line 2650: qdisc_skb_cb(skb)->pkt_len += (gso_segs - 1) * hdr_len;
> so the adjusted size does not seem to end in skb->len
> 
> 
> and then in 
> net/sched/sch_api.c, __qdisc_calculate_pkt_len
> line440: pkt_len = skb->len + stab->szopts.overhead;
> 
> So to my eyes this looks like stab is not honoring the changes made in 
> qdisc_pkt_len_init, no? At least I fail to see where 
> skb->len is assigned qdisc_skb_cb(skb)->pkt_len
> But I happily admit that I am truly a novice in these matters and easily 
> intimidated by C code.
> 
> 
>> 
>> Thus, in kernels >= 3.9, you would need to change/reduce your tc
>> "overhead" parameter with -14 bytes (iif you accounted encapsulated
>> Ethernet header before)
> 
>       That is what I thought before, but my kernel spelunking made me 
> reconsider and switch to not subtract the 14 bytes since as I understand it 
> the kernel actively does not do it if stab is used.
> 
>> 
>> The "overhead" of stab can be negative, so no problem here, in an "int"
>> for stab.
>> 
>> 
>>>> Meaning that
>>>> some ATM encap overheads simply cannot be configured correctly (as you
>>>> need to subtract the ethernet header).
>>> 
>>>     Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting the 
>>> overhead to be larger than it actually is is relatively benign, as it will 
>>> overestimate packe size.


As a point of information, the entire UK uses PPPoA rather than PPPoE, and some 
hundreds of thousands of users IPoA.



>>> 
>>>> (And its quite problematic to
>>>> change the kABI to allow for a negative overhead)
>>> 
>>>     Again I have no clue but overhead seems to be integer, not unsigned, so 
>>> why can it not be negative?
>> 
>> Nope, for reference in include/uapi/linux/pkt_sched.h
>> 
>> This struct tc_ratespec is used by the normal "HTB/TBF" rate system,
>> notice "unsigned short       overhead".
>> 
>> struct tc_ratespec {
>>      unsigned char   cell_log;
>>      __u8            linklayer; /* lower 4 bits */
>>      unsigned short  overhead;
>>      short           cell_align;
>>      unsigned short  mpu;
>>      __u32           rate;
>> };
>> 
>> 
>> This struct tc_sizespec is used by stab system, where the overhead is
>> an int.
>> 
>> struct tc_sizespec {
>>      unsigned char   cell_log;
>>      unsigned char   size_log;
>>      short           cell_align;
>>      int             overhead;
>>      unsigned int    linklayer;
>>      unsigned int    mpu;
>>      unsigned int    mtu;
>>      unsigned int    tsize;
>> };
> 
>       Ah, good to know.
> 
>> 
>> 
>> 
>> 
>>>> 
>>>> Perhaps we should change to use "tc stab" for this reason.  But I'm not
>>>> sure "stab" does the right thing either, and its accuracy is also
>>>> limited as its actually also table based.
>>> 
>>>     But why should a table be problematic here? As long as we can assure 
>>> the table is equal or larger to the largest packet we are golden. So either 
>>> we do the manly and  stupid thing and go for 9000 byte jumbo packets for 
>>> the table size. Or we assume that for the most part ATM users will art best 
>>> use baby jumbo frames (I think BT does this to allow payload MTU 1500 in 
>>> spite of PPPoE encapsulation overhead) but than we are quite fine with the 
>>> default size table maxMTU of 2048 bytes, no?
>> 
>> 
>> It is the GSO problem that I'm worried about.  The kernel will bunch up
>> packets, and that caused the length calculation issue... just disable
>> GSO.
>> 
>> ethtool -K eth63 gso off gro off tso off
> 
>       Oh, as always Dave has this tackled in cerowrt already, all offloads 
> are off (at 16000Kbit/s 2500Kbit/s there should be no need for offloads 
> nowadays I guess). But I see the issue now. 
> 
> 
>> 
>> 
>>>> We could easily change the
>>>> kernel to perform the ATM cell overhead calc inside "stab", and we
>>>> should also fix the GSO packet overhead problem.
>>>> (for now remember to disable GSO packets when shaping)
>>> 
>>>     Yeah I stumbled over the fact that the stab mechanism does not honor 
>>> the kernels earlier adjustments of packet length (but I seem to be unable 
>>> to find the actual file and line where this initially is handeled). It 
>>> would seem relatively easy to make stab take the earlier adjustment into 
>>> account. Regarding GSO, I assumed that GSO will not play nicely with a AQM 
>>> anyway as a single large packet will hog too much transfer time...
>>> 
>> 
>> Yes, just disable GSO ;-)
> 
>       Done.
> 
>> 
>> 
>>>> 
>>>>> It's my hope that the atm code works but is misconfigured. You can output
>>>>> the tc commands by overriding the TC variable with TC="echo tc" and paste
>>>>> here.
>>>> 
>>>> I also hope is a misconfig.  Please show us the config/script.
>>> 
>>>     Will do this later. I would be delighted if it is just me being stupid.
>>> 
>>>> 
>>>> I would appreciate a link to the scripts you are using... perhaps a git 
>>>> tree?
>>> 
>>>     Unfortunately I have no git tree and no experience with git. I do not 
>>> think I will be able to set something up quickly. But I use a modified 
>>> version of cerowrt's AQM scripts which I will post later.
>>> 
>> 
>> Someone just point me to the cerowrt git repo... please, and point me
>> at simple.qos script.
>> 
> <simple.qos><functions.sh>
>> 
>> 
>> Did you add the "linklayer atm" yourself to Dave's script?
> 
>       Well, partly the option for HTB was already in his script but under 
> tested, I changed the script to add stab and to allow easier configuration of 
> overhead, mow, mtu and tsize (just for stab) from the guy, but the code is 
> Dave's. I attached the scripts. functions.sh gets the values from the 
> configuration GUI. I extended the way the linklayer option strings are 
> created, but basically it is the same method that dave used. And I do see the 
> right overhead values appear in "tc -d qdisc", so at least something is 
> reaching HTB. Sorry, that I have no repository for easier access.
> 
> 
> 
> 
>> 
>> 
>> 
>> 
>>>> 
>>>>>>      Now, I have been testing this using Dave's most recent cerowrt
>>>>>> alpha version with a 3.10.9 kernel on mips hardware, I think this kernel
>>>>>> should contain all htb fixes including commit 8a8e3d84b17 (net_sched:
>>>>>> restore "linklayer atm" handling) but am not fully sure.
>>>>>> 
>>>>> 
>>>>> It does.
>>>> 
>>>> It have not hit the stable tree yet, but DaveM promised he would pass it 
>>>> along.
>>>> 
>>>> It does seem Dave Taht have my patch applied:
>>>> http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-restore-linklayer-atm-handling.patch
>>> 
>>>     Ah, good so it should have worked.
>> 
>> It should...
>> 
>> 
>>>> 
>>>>>> While I am not able to build kernels, it seems that I am able to quickly
>>>>>> test whether link layer adjustments work or not. SO aim happy to help 
>>>>>> where
>>>>>> I can :)
>>>> 
>>>> So, what is you setup lab, that allow you to test this quickly?
>>> 
>>>     Oh, Dave and Toke are the giants on whose shoulders I stand here 
>>> (thanks guys), all I bring to the table basically is the fact that I have 
>>> an ATM carried ADSL2+ connection at home.
>> 
>> I will soon have a ADSL lab again, so I try and reproduce your results.
>> Actually, almost while typing this email, the postman arrived at my
>> house and delivered a new ADSL modem... as a local Danish ISP
>> www.fullrate.dk have been so kind to give me a testline for free
>> (thanks fullrate!).
> 
>       This is great! Even though I am quite sure that no real DSL link is 
> actually required to test the effect of the link layer adjustments.
> 
>> 
>> 
>>>     Anyway, my theory is that proper link layer adjustments should only 
>>> show up if not performing these would make my traffic exceed my link-speed 
>>> and hence accumulate in the DSL modems bloated buffers leading to 
>>> measurable increases in latency. So I try to saturate the both up- and 
>>> down-link while measuring latency und different conditions. SInce the worst 
>>> case overhead of the ATM encapsulation approaches 50% (with best case being 
>>> around 10%) I try to test the system while shaping to 95% percent of link 
>>> rates where do expect to see an effect of the link layer adjustments and 
>>> while shaping to 50% where do not expect to see an effect. And basically 
>>> that seems to work.
>> 
>>>     Practically, I use Toke's netsurf-wrapper project with the RRUL test 
>>> from my cerowrt router behind an ADSL2+ modem to a close netperf server in 
>>> Germany. The link layer adjustments are configured in my cerowrt router, 
>>> using Dave's simple.qos script (3 band HTB shaper with fq_codel on each 
>>> leaf, taking my overhead of 40 bytes into account and optionally the link 
>>> layer).
>> 
>>>     It turns out that this test nicely saturates my link with 4 up
>>> and 4 down TCP flows ad uses a train ping probes at 0.2 second period
>>> to assess the latency induced by saturating the links. Now I shape down
>>> to 95% and 50% of line rates and simply look at the ping RTT plot for
>>> different conditions. In my rig I see around 30ms ping RTT without
>>> load, 80ms with full saturation and no linklayer adjustments, and 40ms
>>> with working link layer adjustments (hand in hand with slightly reduced
>>> TCP good put just as one would expect). In my testing so far activating
>>> the HTB link layer adjustments yielded the same 80ms delay I get
>>> without link layer adjustments. If I shape down to 50% of link rates
>>> HTB, stab and no link layer adjustments yield a ping RTT of ~40ms.
>>> Still with proper link layer adjustments the TCP good-put is reduced
>>> even at 50% shaping. As Dave explained with an unloaded swallow ermm,
>>> ping RTT and fq_codel's target set to 5ms the best case would be 30ms +
>>> 2*5ms or 40ms, so I am pretty close to ideal with proper link layer
>>> adjustments.
>>> 
>> 
>>>     I guess it should be possible to simply use the reduction in good-put 
>>> as an easy indicator whether the link layer adjustments work or not. But to 
>>> do this properly I would need to be able to control the size of the sent 
>>> packets which I am not, at least not with RRUL. But I am quite sure real 
>>> computer scientists could easily set something up to test the good-put 
>>> through a shaping device at differently sized packet streams of the same 
>>> bandwidth, but I digress. 
>>> 
>>>     On the other hand I do not claim to be an expert in this field in any 
>>> way and my measurement method might be flawed, if you think so please do 
>>> not hesitate to let me know how I could improve it.
>> 
>> I have a hard time following your description... sorry.
> 
> Okay, I see, let me try to present the data in a more ordered fashion:
> BW: bandwidth [Kbit/s]
> LLAM = link-layer adjustment method
> LL: link layer
> GP: goodput [Kbit/s]
> 
> #     shaped  downBW  (%)             upBW    (%)             LLAM    LL      
>         ping RTT        downGP          upGP
> 1     no              16309           100             2544    100             
> none    none    300ms           10000           1600
> 2     yes             14698           90              2430    95              
> none    none    80ms            13600           1800
> 3     yes             14698           90              2430    95              
> stab            adsl            40ms            11600           1600
> 4     yes             15494           95              2430    95              
> stab            adsl            42ms            12400           1600
> 5     yes             14698           90              2430    95              
> htb             adsl            75ms            13200           1600
> 
> 2     yes             7349            45              1215    48              
> none    none    45ms            6800            1000
> 4     yes             7349            45              1215    48              
> stab            adsl            42ms            5800            800
> 5     yes             7349            45              1215    48              
> htb             adsl            45ms            6600            1000
> 
> Notes: upGP is way noisier than downGP and therefore harder to estimate
> 
> So condition# 3 and 4 show the best latency at high link saturation where 
> link layer adjustments actually make a difference, by controlling whether the 
> DSL modem will buffer or not
> At ~50% link saturation there is not much, if any effect of the link layer 
> adjustments on latency, but it still leaves its hallmark on good put 
> reduction. (The partial reduction for htb might be caused by the 
> specification of 40 bytes of overhead which seem to have been honored).
> I take the disappearance of the latency effect at 50% as a control data point 
> that shows my measurement approach seems sane enough.
> I hope this clears up the information I wanted to give you the first time 
> around.
> 
> 
> 
>> 
>> So, did you get a working-low-latency setup by using 95% shaping and
>> "stab" linklayer adjustment?
> 
>       Yes. (With a 3 leaf HTB as shaper and fq_codel as disc)
> 
> Best Regards
>       Sebastian
> 
>> 
>> -- 
>> Best regards,
>> Jesper Dangaard Brouer
>> MSc.CS, Sr. Network Kernel Developer at Red Hat
>> Author of http://www.iptv-analyzer.org
>> LinkedIn: http://www.linkedin.com/in/brouer
> 
> _______________________________________________
> Cerowrt-devel mailing list
> [email protected]
> https://lists.bufferbloat.net/listinfo/cerowrt-devel

_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Re: [Cerowrt-devel] some kernel updates

Reply via email to