Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.

Fred Stratton Sat, 28 Dec 2013 12:10:12 -0800


On 28/12/13 19:54, Sebastian Moeller wrote:

Hi Fred,



On Dec 28, 2013, at 15:27 , Fred Stratton <[email protected]> wrote:

On 28/12/13 13:42, Sebastian Moeller wrote:

Hi Fred,


On Dec 28, 2013, at 12:09 , Fred Stratton
<[email protected]>
  wrote:

IThe UK consensus fudge factor has always been 85 per cent of the rate 
achieved, not 95 or 99 per cent.

        I know that the recommendations have been lower in the past; I think this is partly because 
before Jesper Brouer's and Russels Stuart's work to properly account for ATM 
"quantization" people typically had to deal with a ~10% rate tax for the 5byte per cell 
overhead (48 byte payload in 53 byte cells 90.57% useable rate) plus an additional 5% to 
stochastically account for the padding of the last cell and the per packet overhead both of which 
affect the effective good put way more for small than large packets, so the 85% never worked well 
for all packet sizes. My hypothesis now is since we can and do properly account for these effects 
of ATM framing we can afford to start with a fudge factor of 90% or even 95% percent. As far as I 
know the recommended fudge factors are never ever explained by more than "this works 
empirically"...

The fudge factors are totally empirical. IF you are proposing a more formal 
approach, I shall try a 90 per cent fudge factor, although 'current rate' 
varies here.

        My hypothesis is that we can get away with less fudge as we have a 
better handle on the actual wire size. Personally, I do start at 95% to figure 
out the trade-off between bandwidth loss and latency increase.

You are now saying something slightly different. You are implying nowthat you are starting at 95 per cent, and then reducing the nominaldownload speed until you achieve an unspecified endpoint.

Devices express 2 values: the sync rate - or 'maximum rate attainable' - and 
the dynamic value of 'current rate'.

        The actual data rate is the relevant information for shaping, often DSL modems 
report the link capacity as "maximum rate attainable" or some such, while the 
actual bandwidth is limited to a rate below what the line would support by contract 
(often this bandwidth reduction is performed on the PPPoE link to the BRAS).

As the sync rate is fairly stable for any given installation - ADSL or Fibre  - 
this could be used as a starting value. decremented by the traditional 15 per 
cent of 'overhead'. and the 85 per cent fudge factor applied to that.

        I would like to propose to use the "current rate" as starting point, as 
'maximum rate attainable' >= 'current rate'.

'current rate' is still a sync rate, and so is conventionally viewed as 15 per 
cent above the unmeasurable actual rate.

        No no, the current rate really is the current link capacity between 
modem and DSLAM (or CPE and CTS), only this rate typically is for the raw ATM 
stream, so we have to subtract all the additional layers until we reach the IP 
layer...


You are saying the same thing as I am.

As you are proposing a new approach, I shall take 90 per cent of 'current rate' 
as a starting point.

        I would love to learn how that works put for you. Because for all my 
theories about why 85% was used, the proof still is in the (plum-) pudding...

No one in the UK uses SRA currently. One small ISP used to.

        That is sad, because on paper SRA looks like a good feature to have 
(lower bandwidth sure beats synchronization loss).

The ISP I currently use has Dynamic Line Management, which changes target SNR 
constantly.

        Now that is much better, as we should neuter notice nor care; I assume 
that this happens on layers below ATM even.

The DSLAM is made by Infineon.

Fibre - FTTC - connections can suffer quite large download speed fluctuations 
over the 200 - 500 metre link to the MSAN.  This phenomenon is not confined to 
ADSL links.

        On the actual xDSL link? As far as I know no telco actually uses SRA 
(seamless rate adaptation or so) so the current link speed will only get lower 
not higher, so I would expect a relative stable current rate (it might take a 
while, a few days to actually slowly degrade to the highest link speed 
supported under all conditions, but I hope you still get my point)

I understand the point, but do not think it is the case, from data I have seen, 
but cannot find now, unfortunately.

        I see, maybe my assumption here is wrong, I would love to see data 
though before changing my hypothesis.

An alternative speed test is something like this


http://download.bethere.co.uk/downloadMeter.html


which, as Be has been bought by Sky, may not exist after the end of April 2014.

        But, if we recommend to run speed tests we really need to advise our 
users to start several concurrent up- and downloads to independent servers to 
actually measure the bandwidth of our bottleneck link; often a single server 
connection will not saturate a link (I seem to recall that with TCP it is 
guaranteed to only reach 75% or so averaged over time, is that correct?).
        But I think this is not the proper way to set the bandwidth for the 
shaper, because upstream of our link to the ISP we have no guaranteed bandwidth 
at all and just can hope the ISP is oing the right thing AQM-wise.

I quote the Be site as an alternative to a java based approach. I would be very 
happy to see your suggestion adopted.

        • [What is the proper description here?] If you use PPPoE (but not over 
ADSL/DSL link), PPPoATM, or bridging that isn’t Ethernet, you should choose 
[what?] and set the Per-packet Overhead to [what?]

For a PPPoA service, the PPPoA link is treated as PPPoE on the second device, 
here running ceroWRT.

        This still means you should specify the PPPoA overhead, not PPPoE.

I shall try the PPPoA overhead.

        Great, let me know how that works.

The packet overhead values are written in the dubious man page for tc_stab.

        The only real flaw in that man page, as far as I know, is the fact that 
it indicates that the kernel will account for the 18byte ethernet header 
automatically, while the kernel does no such thing (which I hope to change).

It mentions link layer types as 'atm' ethernet' and 'adsl'. There is no 
reference anywhere to the last. I do not see its relevance.

        If you have a look inside the source code for tc and the kernel, you 
will notice that atm and adel are aliases for the same thing. I just think that 
we should keep naming the thing ATM since that is the problematic layer in the 
stack that causes most of the useable link rate judgements, adel just happens 
to use ATM exclusively.


I have reviewed the source. I see what you mean.

Sebastian has a potential alternative method of formal calculation.

        So, I have no formal calculation method available, but an empirical way 
of detecting ATM quantization as well as measuring the per packet overhead of 
an ATM link.
        The idea is to measure the RTT of ICMP packets of increasing length and then 
displaying the distribution of RTTs by ICMP packet length, on an ATM carrier we expect to 
see a step function with steps 48 bytes apart. For non-ATM carrier we expect to rather 
see a smooth ramp. By comparing the residuals of a linear fit of the data with the 
residuals of the best step function fit to the data. The fit with the lower residuals 
"wins". Attached you will find an example of this approach, ping data in red 
(median of NNN repetitions for each ICMP packet size), linear fit in blue, and best 
staircase fit in green. You notice that data starts somewhere in a 48 byte ATM cell. 
Since the ATM encapsulation overhead is maximally 44 bytes and we know the IP and ICMP 
overhead of the ping probe we can calculate the overhead preceding the IP header, which 
is what needs to be put in the overhead field in the GUI. (Note where the green line 
intersect the y-axis at 0 bytes packet size? this is where the IP hea
der starts, the "missing" part of this ATM cell is the overhead).

You are curve fitting. This is calculation.

        I see, that is certainly a valid way to look at it, just one that had 
not occurred to me.






        Believe it or not, this methods works reasonable well (I tested 
successfully with one Bridged, LLC/SNAP RFC-1483/2684 connection (overhead 32 
bytes), and several PPPOE, LLC, (overhead 40) connections (from ADSL1 @ 
3008/512 to ADSL2+ @ 16402/2558)). But it takes relative long time to measure 
the ping train especially at the higher rates… and it requires ping time stamps 
with decent resolution (which rules out windows) and my naive data acquisition 
scripts creates really large raw data files. I guess I should post the code 
somewhere so others can test and improve it.
        Fred I would be delighted to get a data set from your connection, to 
test a known different encapsulation.

I shall try this. If successful, I shall initially pass you the raw data.

        Great, but be warned this will be hundreds of megabytes. (For 
production use the measurement script would need to prune the generated log 
file down to the essential values… and potentially store the data in binary)

I have not used MatLab since the 1980s.

        Lucky you, I sort of have to use matlab in my day job and hence are most 
"fluent" in matlabese, but the code should also work with octave (I tested 
version 3.6.4) so it should be relatively easy to run the analysis yourself. That said, I 
would love to get a copy of the ping sweep :)

TYPICAL OVERHEADS
        The following values are typical for different adsl scenarios (based on
        [1] and [2]):

        LLC based:
            PPPoA - 14 (PPP - 2, ATM - 12)
            PPPoE - 40+ (PPPoE - 8, ATM - 18, ethernet 14, possibly FCS - 
4+padding)
            Bridged - 32 (ATM - 18, ethernet 14, possibly FCS - 4+padding)
            IPoA - 16 (ATM - 16)

        VC Mux based:
            PPPoA - 10 (PPP - 2, ATM - 8)
            PPPoE - 32+ (PPPoE - 8, ATM - 10, ethernet 14, possibly FCS - 
4+padding)
            Bridged - 24+ (ATM - 10, ethernet 14, possibly FCS - 4+padding)
            IPoA - 8 (ATM - 8)


For VC Mux based PPPoA, I am currently using an overhead of 18 for the PPPoE 
setting in ceroWRT.

        Yeah we could put this list into the wiki, but how shall a typical user 
figure out which encapsulation is used? And good luck in figuring out whether 
the frame check sequence (FCS) is included or not…
BTW 18, I predict that if PPPoE is only used between cerowrt and the "modem' or 
gateway your effective overhead should be 10 bytes; I would love if you could run 
the following against your link at night (also attached



):

#! /bin/bash
# TODO use seq or bash to generate a list of the requested sizes (to allow for 
non-equidistantly spaced sizes)

#.
TECH=ADSL2      # just to give some meaning to the ping trace file name
# finding a proper target IP is somewhat of an art, just traceroute a remote 
site.
# and find the nearest host reliably responding to pings showing the smallet 
variation of pingtimes
TARGET=${1} # the IP against which to run the ICMP pings
DATESTR=`date +%Y%m%d_%H%M%S`<-># to allow multiple sequential records
LOG=ping_sweep_${TECH}_${DATESTR}.txt


# by default non-root ping will only end one packet per second, so work around 
that by calling ping independently for each package
# empirically figure out the shortest period still giving the standard ping 
time (to avoid being slow-pathed by our target)
PINGPERIOD=0.01><------># in seconds
PINGSPERSIZE=10000

# Start, needed to find the per packet overhead dependent on the ATM 
encapsulation
# to reiably show ATM quantization one would like to see at least two steps, so cover 
a range > 2 ATM cells (so > 96 bytes)
SWEEPMINSIZE=16><------># 64bit systems seem to require 16 bytes of payload to 
include a timestamp...
SWEEPMAXSIZE=116

n_SWEEPS=`expr ${SWEEPMAXSIZE} - ${SWEEPMINSIZE}`

i_sweep=0
i_size=0

echo "Running ICMP RTT measurement against: ${TARGET}"
while [ ${i_sweep} -lt ${PINGSPERSIZE} ]
do
     (( i_sweep++ ))
     echo "Current iteration: ${i_sweep}"
     # now loop from sweepmin to sweepmax
     i_size=${SWEEPMINSIZE}
     while [ ${i_size} -le ${SWEEPMAXSIZE} ]
     do
        echo "${i_sweep}. repetition of ping size ${i_size}"
        ping -c 1 -s ${i_size} ${TARGET} >> ${LOG} &\
        (( i_size++ ))
        # we need a sleep binary that allows non integer times (GNU sleep is 
fine as is sleep of macosx 10.8.4)
        sleep ${PINGPERIOD}
     done
done
echo "Done... ($0)"


This will try to run 10000 repetitions for ICMP packet sizes from 16 to 116 
bytes running (10000 * 101 * 0.01 / 60 =) 168 minutes, but you should be able 
to stop it with ctrl c if you are not patience enough, with your link I would 
estimate that 3000 should be plenty, but if you could run it over night that 
would be great and then ~3 hours should not matter much.
        And then run the following attached code in octave or matlab



. Invoce with 
"tc_stab_parameter_guide_03('path/to/the/data/file/you/created/name_of_said_file')".
 The parser will run on the first invocation and is reallr really slow, but further 
invocations should be faster. If issues arise, let me know, I am happy to help.

Were I to use a single directly connected gateway, I would input a suitable 
value for PPPoA in that openWRT firmware.

        I think you should do that right now.

The firmware has not yet been released.

In theory, I might need to use a negative value, bmt the current kernel does 
not support that.

        If you use tc_stab, negative overheads are fully supported, only 
htb_private has overhead defined as unsigned integer and hence does not allow 
negative values.

Jesper Brouer posted about this. I thought he was referring to tc_stab.

        I recall having a discussion with Jesper about this topic, where he 
agreed that tc_stab was not affected, only htb_private.

Reading what was said on 23rd August, you corrected his error ininterpretation.

I have used many different arbitrary values for overhead. All appear to have 
little effect.

        So the issue here is that only at small packet sizes does the overhead and last 
cell padding eat a disproportionate amount of your bandwidth (64 byte packet plus 44 byte 
overhead plus 47 byte worst case cell padding: 100* (44+47+64)/64 = 242% effective packet 
size to what the shaper estimated ), at typical packet sizes the max error (44 bytes 
missing overhead and potentially misjudged cell padding of 47 bytes adds up to a 
theoretical 100*(44+47+1500)/1500 = 106%  effective packet size to what the shaper 
estimated). It is obvious that at 1500 byte packets the whole ATM issue can be easily 
dismissed with just reducing the link rate by ~10% for the 48 in 53 framing and an 
additional ~6% for overhead and cell padding. But once you mix smaller packets in your 
traffic for say VoIP, the effective wire size misjudgment will kill your ability to 
control the queueing. Note that the common wisdom of shape down to 85% might be fem the 
~15% ATM "tax" on 1500 byte traffic size...

As I understand it, the current recommendation is to use tc_stab in preference 
to htb_private. I do not know the basis for this value judgement.

        In short: tc_stab allows negative overheads, tc_stab works with HTB, 
TBF, HFSC while htb_private only works with HTB. Currently htb_private has two 
advantages: it will estimate the per packet overhead correctly of GSO (generic 
segmentation offload) is enabled and it will produce exact ATM link layer 
estimates for all possible packet sizes. In practice almost everyone uses an 
MTU of 1500 or less for their internet access making both htb_private 
advantages effectively moot. (Plus if no one beats me to it I intend to address 
both theoretical short coming of tc_stab next year).

Best Regards
        Sebastian




On 28/12/13 10:01, Sebastian Moeller wrote:

Hi Rich,

great! A few comments:

Basic Settings:
[Is 95% the right fudge factor?] I think that ideally, if we get can precisely 
measure the useable link rate even 99% of that should work out well, to keep 
the queue in our device. I assume that due to the difficulties in measuring and 
accounting for the link properties as link layer and overhead people typically 
rely on setting the shaped rate a bit lower than required to 
stochastically/empirically account for the link properties. I predict that if 
we get a correct description of the link properties to the shaper we should be 
fine with 95% shaping. Note though, it is not trivial on an adel link to get 
the actually useable bit rate from the modem so 95% of what can be deduced from 
the modem or the ISP's invoice might be a decent proxy…

[Do we have a recommendation for an easy way to tell if it's working? Perhaps a 
link to a new Quick Test for Bufferbloat page. ] The linked page looks like a 
decent probe for buffer bloat.

Basic Settings - the details...

CeroWrt is designed to manage the queues of packets waiting to be sent across 
the slowest (bottleneck) link, which is usually your connection to the Internet.

        I think we can only actually control the first link to the ISP, which 
often happens to be the bottleneck. At a typical DSLAM (xDSL head end station) 
the cumulative sold bandwidth to the customers is larger than the back bone 
connection (which is called over-subscription and is almost guaranteed to be 
the case in every DSLAM) which typically is not a problem, as typically people 
do not use their internet that much. My point being we can not really control 
congestion in the DSLAM's uplink (as we have no idea what the reserved rate per 
customer is in the worst case, if there is any).

CeroWrt can automatically adapt to network conditions to improve the 
delay/latency of data without any settings.

        Does this describe the default fq_codels on each interface (except 
fib?)?

However, it can do a better job if it knows more about the actual link speeds 
available. You can adjust this setting by entering link speeds that are a few 
percent below the actual speeds.

Note: it can be difficult to get an accurate measurement of the link speeds. 
The speed advertised by your provider is a starting point, but your experience 
often won't meet their published specs. You can also use a speed test program 
or web site like

http://speedtest.net

  to estimate actual operating speeds.

        While this approach is commonly recommended on the internet, I do not 
believe that it is that useful. Between a user and the speediest site there are a 
number of potential congestion points that can affect (reduce) the throughput, 
like bad peering. Now that said the sppedtets will report something <= the 
actual link speed and hence be conservative (interactivity stays great at 90% of 
link rate as well as 80% so underestimating the bandwidth within reason does not 
affect the latency gains from traffic shaping it just sacrifices a bit more 
bandwidth; and given the difficulty to actually measure the actually attainable 
bandwidth might have been effectively a decent recommendation even though the 
theory of it seems flawed)

Be sure to make your measurement when network is quiet, and others in your home 
aren’t generating traffic.

        This is great advise.

I would love to comment further, but after reloading

http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310

  just returns a blank page and I can not get back to the page as of yesterday 
evening… I will have a look later to see whether the page resurfaces…

Best
        Sebastian


On Dec 27, 2013, at 23:09 , Rich Brown

<[email protected]>

  wrote:

You are a very good writer and I am on a tablet.

Thanks!

Ill take a pass at the wiki tomorrow.

The shaper does up and down was my first thought...

Everyone else… Don’t let Dave hog all the fun! Read the tech note and give 
feedback!

Rich

On Dec 27, 2013 10:48 AM, "Rich Brown" <[email protected]>

  wrote:
I updated the page to reflect the 3.10.24-8 build, and its new GUI pages.



http://www.bufferbloat.net/projects/cerowrt/wiki/Setting_up_AQM_for_CeroWrt_310



There are still lots of open questions. Comments, please.

Rich
_______________________________________________
Cerowrt-devel mailing list


[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

_______________________________________________
Cerowrt-devel mailing list


[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

_______________________________________________
Cerowrt-devel mailing list


[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel



_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Re: [Cerowrt-devel] Update to "Setting up SQM for CeroWrt 3.10" web page. Comments needed.

Reply via email to