On 23 Aug 2013, at 13:37, Sebastian Moeller <[email protected]> wrote:
> Hi Jesper, > > thanks for your time… > > On Aug 23, 2013, at 13:16 , Jesper Dangaard Brouer <[email protected]> wrote: > >> On Fri, 23 Aug 2013 12:15:12 +0200 >> Sebastian Moeller <[email protected]> wrote: >> >>> Hi Jesper, >>> >>> >>> On Aug 23, 2013, at 09:27 , Jesper Dangaard Brouer <[email protected]> >>> wrote: >>> >>>> On Thu, 22 Aug 2013 22:13:52 -0700 >>>> Dave Taht <[email protected]> wrote: >>>> >>>>> On Thu, Aug 22, 2013 at 5:52 PM, Sebastian Moeller <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi List, hi Jesper, >>>>>> >>>>>> So I tested 3.10.9-1 to assess the status of the HTB atm link layer >>>>>> adjustments to see whether the recent changes resurrected this feature. >>>>>> Unfortunately the htb_private link layer adjustments still is >>>>>> broken (RRUL ping RTT against Toke's netperf host in Germany of ~80ms, >>>>>> same >>>>>> as without link layer adjustments). On the bright side the tc_stab method >>>>>> still works as well as before (ping RTT around 40ms). >>>>>> I would like to humbly propose to use the tc stab method in >>>>>> cerowrt to perform ATM link layer adjustments as default. To repeat >>>>>> myself, >>>>>> simply telling the kernel a lie about the packet size seems more robust >>>>>> than fudging HTB's rate tables. >>>> >>>> After the (regression) commit 56b765b79 ("htb: improved accuracy at >>>> high rates"), the kernel no-longer uses the rate tables. >>> >>> See, I am quite a layman here, spelunking through the tc and kernel >>> source code made me believe that the rate tables are still used (I might >>> have looked at too old versions of both repositories though). >>> >>>> >>>> My commit 8a8e3d84b1719 (net_sched: restore "linklayer atm" handling), >>>> does the ATM cell overhead calculation directly on the packet length, >>>> see psched_l2t_ns() doing (DIV_ROUND_UP(len,48)*53). >>>> Thus, the cell calc should actually be more precise now.... but see below >>> >>> Is there any way to make HTB report which link layer it assumes? >> >> I added some print debug statements in my patch, so you can see this in >> the kernel log / dmesg. Run activate the debugging print statments: >> >> mount -t debugfs none /sys/kernel/debug/ >> echo "func __detect_linklayer +p" >>> /sys/kernel/debug/dynamic_debug/control >> >> Run your tc script, and run dmesg, or look in kernel-syslog. > > Ah, unfortunately I am not setup to build new kernels for the router I > am testing on, so I would hereby like to beg Dave to include that patch in > one of the next releases. Would it not a good idea to teach tc to report the > link layer for HTB as it does for stab? Having to empirically figure out > whether it is applied or not is somewhat cumbersome... > > >> >> >>>> >>>>>> Especially since the kernel already fudges >>>>>> the packet size to account for the ethernet header and then some, so this >>>>>> path should receive more scrutiny by virtue of having more users? >>>> >>>> As you mention, the default kernel path (not tc stab) fudges the packet >>>> size for Ethernet headers, AND I made a mistake (back in approx 2006, >>>> sorry) that the "overhead" cannot be a negative number. >>> >>> Mmh, does this also apply to stab? >> >> This seems to be two question... >> >> Yes, the Ethernet header size gets adjusted/added before the "stab" >> call. >> For reference >> See: net/core/dev.c function __dev_xmit_skb() >> Call qdisc_pkt_len_init(skb); // adjust Ethernet and account for GSO >> Call qdisc_calculate_pkt_len(skb, q); // is the stab call >> (ps calls __qdisc_calculate_pkt_len() in net/sched/sch_api.c) >> >> The qdisc_pkt_len_init() call were introduced by Eric in >> v3.9-rc1~139^2~411. > > So I look at 3.10 here: > > net/core/dev.c, qdisc_pkt_len_init > line 2628: qdisc_skb_cb(skb)->pkt_len = skb->len; > and in > line 2650: qdisc_skb_cb(skb)->pkt_len += (gso_segs - 1) * hdr_len; > so the adjusted size does not seem to end in skb->len > > > and then in > net/sched/sch_api.c, __qdisc_calculate_pkt_len > line440: pkt_len = skb->len + stab->szopts.overhead; > > So to my eyes this looks like stab is not honoring the changes made in > qdisc_pkt_len_init, no? At least I fail to see where > skb->len is assigned qdisc_skb_cb(skb)->pkt_len > But I happily admit that I am truly a novice in these matters and easily > intimidated by C code. > > >> >> Thus, in kernels >= 3.9, you would need to change/reduce your tc >> "overhead" parameter with -14 bytes (iif you accounted encapsulated >> Ethernet header before) > > That is what I thought before, but my kernel spelunking made me > reconsider and switch to not subtract the 14 bytes since as I understand it > the kernel actively does not do it if stab is used. > >> >> The "overhead" of stab can be negative, so no problem here, in an "int" >> for stab. >> >> >>>> Meaning that >>>> some ATM encap overheads simply cannot be configured correctly (as you >>>> need to subtract the ethernet header). >>> >>> Yes, I see, luckily PPPoA and IPoA seem quite rare, and setting the >>> overhead to be larger than it actually is is relatively benign, as it will >>> overestimate packe size. As a point of information, the entire UK uses PPPoA rather than PPPoE, and some hundreds of thousands of users IPoA. >>> >>>> (And its quite problematic to >>>> change the kABI to allow for a negative overhead) >>> >>> Again I have no clue but overhead seems to be integer, not unsigned, so >>> why can it not be negative? >> >> Nope, for reference in include/uapi/linux/pkt_sched.h >> >> This struct tc_ratespec is used by the normal "HTB/TBF" rate system, >> notice "unsigned short overhead". >> >> struct tc_ratespec { >> unsigned char cell_log; >> __u8 linklayer; /* lower 4 bits */ >> unsigned short overhead; >> short cell_align; >> unsigned short mpu; >> __u32 rate; >> }; >> >> >> This struct tc_sizespec is used by stab system, where the overhead is >> an int. >> >> struct tc_sizespec { >> unsigned char cell_log; >> unsigned char size_log; >> short cell_align; >> int overhead; >> unsigned int linklayer; >> unsigned int mpu; >> unsigned int mtu; >> unsigned int tsize; >> }; > > Ah, good to know. > >> >> >> >> >>>> >>>> Perhaps we should change to use "tc stab" for this reason. But I'm not >>>> sure "stab" does the right thing either, and its accuracy is also >>>> limited as its actually also table based. >>> >>> But why should a table be problematic here? As long as we can assure >>> the table is equal or larger to the largest packet we are golden. So either >>> we do the manly and stupid thing and go for 9000 byte jumbo packets for >>> the table size. Or we assume that for the most part ATM users will art best >>> use baby jumbo frames (I think BT does this to allow payload MTU 1500 in >>> spite of PPPoE encapsulation overhead) but than we are quite fine with the >>> default size table maxMTU of 2048 bytes, no? >> >> >> It is the GSO problem that I'm worried about. The kernel will bunch up >> packets, and that caused the length calculation issue... just disable >> GSO. >> >> ethtool -K eth63 gso off gro off tso off > > Oh, as always Dave has this tackled in cerowrt already, all offloads > are off (at 16000Kbit/s 2500Kbit/s there should be no need for offloads > nowadays I guess). But I see the issue now. > > >> >> >>>> We could easily change the >>>> kernel to perform the ATM cell overhead calc inside "stab", and we >>>> should also fix the GSO packet overhead problem. >>>> (for now remember to disable GSO packets when shaping) >>> >>> Yeah I stumbled over the fact that the stab mechanism does not honor >>> the kernels earlier adjustments of packet length (but I seem to be unable >>> to find the actual file and line where this initially is handeled). It >>> would seem relatively easy to make stab take the earlier adjustment into >>> account. Regarding GSO, I assumed that GSO will not play nicely with a AQM >>> anyway as a single large packet will hog too much transfer time... >>> >> >> Yes, just disable GSO ;-) > > Done. > >> >> >>>> >>>>> It's my hope that the atm code works but is misconfigured. You can output >>>>> the tc commands by overriding the TC variable with TC="echo tc" and paste >>>>> here. >>>> >>>> I also hope is a misconfig. Please show us the config/script. >>> >>> Will do this later. I would be delighted if it is just me being stupid. >>> >>>> >>>> I would appreciate a link to the scripts you are using... perhaps a git >>>> tree? >>> >>> Unfortunately I have no git tree and no experience with git. I do not >>> think I will be able to set something up quickly. But I use a modified >>> version of cerowrt's AQM scripts which I will post later. >>> >> >> Someone just point me to the cerowrt git repo... please, and point me >> at simple.qos script. >> > <simple.qos><functions.sh> >> >> >> Did you add the "linklayer atm" yourself to Dave's script? > > Well, partly the option for HTB was already in his script but under > tested, I changed the script to add stab and to allow easier configuration of > overhead, mow, mtu and tsize (just for stab) from the guy, but the code is > Dave's. I attached the scripts. functions.sh gets the values from the > configuration GUI. I extended the way the linklayer option strings are > created, but basically it is the same method that dave used. And I do see the > right overhead values appear in "tc -d qdisc", so at least something is > reaching HTB. Sorry, that I have no repository for easier access. > > > > >> >> >> >> >>>> >>>>>> Now, I have been testing this using Dave's most recent cerowrt >>>>>> alpha version with a 3.10.9 kernel on mips hardware, I think this kernel >>>>>> should contain all htb fixes including commit 8a8e3d84b17 (net_sched: >>>>>> restore "linklayer atm" handling) but am not fully sure. >>>>>> >>>>> >>>>> It does. >>>> >>>> It have not hit the stable tree yet, but DaveM promised he would pass it >>>> along. >>>> >>>> It does seem Dave Taht have my patch applied: >>>> http://snapon.lab.bufferbloat.net/~cero2/patches/3.10.9-1/685-net_sched-restore-linklayer-atm-handling.patch >>> >>> Ah, good so it should have worked. >> >> It should... >> >> >>>> >>>>>> While I am not able to build kernels, it seems that I am able to quickly >>>>>> test whether link layer adjustments work or not. SO aim happy to help >>>>>> where >>>>>> I can :) >>>> >>>> So, what is you setup lab, that allow you to test this quickly? >>> >>> Oh, Dave and Toke are the giants on whose shoulders I stand here >>> (thanks guys), all I bring to the table basically is the fact that I have >>> an ATM carried ADSL2+ connection at home. >> >> I will soon have a ADSL lab again, so I try and reproduce your results. >> Actually, almost while typing this email, the postman arrived at my >> house and delivered a new ADSL modem... as a local Danish ISP >> www.fullrate.dk have been so kind to give me a testline for free >> (thanks fullrate!). > > This is great! Even though I am quite sure that no real DSL link is > actually required to test the effect of the link layer adjustments. > >> >> >>> Anyway, my theory is that proper link layer adjustments should only >>> show up if not performing these would make my traffic exceed my link-speed >>> and hence accumulate in the DSL modems bloated buffers leading to >>> measurable increases in latency. So I try to saturate the both up- and >>> down-link while measuring latency und different conditions. SInce the worst >>> case overhead of the ATM encapsulation approaches 50% (with best case being >>> around 10%) I try to test the system while shaping to 95% percent of link >>> rates where do expect to see an effect of the link layer adjustments and >>> while shaping to 50% where do not expect to see an effect. And basically >>> that seems to work. >> >>> Practically, I use Toke's netsurf-wrapper project with the RRUL test >>> from my cerowrt router behind an ADSL2+ modem to a close netperf server in >>> Germany. The link layer adjustments are configured in my cerowrt router, >>> using Dave's simple.qos script (3 band HTB shaper with fq_codel on each >>> leaf, taking my overhead of 40 bytes into account and optionally the link >>> layer). >> >>> It turns out that this test nicely saturates my link with 4 up >>> and 4 down TCP flows ad uses a train ping probes at 0.2 second period >>> to assess the latency induced by saturating the links. Now I shape down >>> to 95% and 50% of line rates and simply look at the ping RTT plot for >>> different conditions. In my rig I see around 30ms ping RTT without >>> load, 80ms with full saturation and no linklayer adjustments, and 40ms >>> with working link layer adjustments (hand in hand with slightly reduced >>> TCP good put just as one would expect). In my testing so far activating >>> the HTB link layer adjustments yielded the same 80ms delay I get >>> without link layer adjustments. If I shape down to 50% of link rates >>> HTB, stab and no link layer adjustments yield a ping RTT of ~40ms. >>> Still with proper link layer adjustments the TCP good-put is reduced >>> even at 50% shaping. As Dave explained with an unloaded swallow ermm, >>> ping RTT and fq_codel's target set to 5ms the best case would be 30ms + >>> 2*5ms or 40ms, so I am pretty close to ideal with proper link layer >>> adjustments. >>> >> >>> I guess it should be possible to simply use the reduction in good-put >>> as an easy indicator whether the link layer adjustments work or not. But to >>> do this properly I would need to be able to control the size of the sent >>> packets which I am not, at least not with RRUL. But I am quite sure real >>> computer scientists could easily set something up to test the good-put >>> through a shaping device at differently sized packet streams of the same >>> bandwidth, but I digress. >>> >>> On the other hand I do not claim to be an expert in this field in any >>> way and my measurement method might be flawed, if you think so please do >>> not hesitate to let me know how I could improve it. >> >> I have a hard time following your description... sorry. > > Okay, I see, let me try to present the data in a more ordered fashion: > BW: bandwidth [Kbit/s] > LLAM = link-layer adjustment method > LL: link layer > GP: goodput [Kbit/s] > > # shaped downBW (%) upBW (%) LLAM LL > ping RTT downGP upGP > 1 no 16309 100 2544 100 > none none 300ms 10000 1600 > 2 yes 14698 90 2430 95 > none none 80ms 13600 1800 > 3 yes 14698 90 2430 95 > stab adsl 40ms 11600 1600 > 4 yes 15494 95 2430 95 > stab adsl 42ms 12400 1600 > 5 yes 14698 90 2430 95 > htb adsl 75ms 13200 1600 > > 2 yes 7349 45 1215 48 > none none 45ms 6800 1000 > 4 yes 7349 45 1215 48 > stab adsl 42ms 5800 800 > 5 yes 7349 45 1215 48 > htb adsl 45ms 6600 1000 > > Notes: upGP is way noisier than downGP and therefore harder to estimate > > So condition# 3 and 4 show the best latency at high link saturation where > link layer adjustments actually make a difference, by controlling whether the > DSL modem will buffer or not > At ~50% link saturation there is not much, if any effect of the link layer > adjustments on latency, but it still leaves its hallmark on good put > reduction. (The partial reduction for htb might be caused by the > specification of 40 bytes of overhead which seem to have been honored). > I take the disappearance of the latency effect at 50% as a control data point > that shows my measurement approach seems sane enough. > I hope this clears up the information I wanted to give you the first time > around. > > > >> >> So, did you get a working-low-latency setup by using 95% shaping and >> "stab" linklayer adjustment? > > Yes. (With a 3 leaf HTB as shaper and fq_codel as disc) > > Best Regards > Sebastian > >> >> -- >> Best regards, >> Jesper Dangaard Brouer >> MSc.CS, Sr. Network Kernel Developer at Red Hat >> Author of http://www.iptv-analyzer.org >> LinkedIn: http://www.linkedin.com/in/brouer > > _______________________________________________ > Cerowrt-devel mailing list > [email protected] > https://lists.bufferbloat.net/listinfo/cerowrt-devel _______________________________________________ Cerowrt-devel mailing list [email protected] https://lists.bufferbloat.net/listinfo/cerowrt-devel
