Hi,

just to document my current understanding of using SQM on a router that also 
terminates a pppoe wan connection. We basically have two options either set up 
SQM on the real interface (let’s call it ge00 like cerowrt does) or on the 
associated pop device, pppoe-ge00. In theory both should produce the same 
results; in praxis current SQM has significant different results. Let me 
enumerate the main differences that show up when testing with netperf-wrapper’s 
RRUL test:

1) SQM on ge00 does not show a working egress classification in the RRUL test 
(no visible “banding”/stratification of the 4 different priority TCP flows), 
while SQM on pppoe-ge00 does show this stratification.

        Now the reason for this is quite obvious once we take into account that 
on ge00 the kernel sees a packet that already contains a PPP header between 
ethernet and IP header and has a different ether_type field, and our diffserv 
filters currently ignore everything except straight ipv4 and ipv6 packets, so 
due to the unexpected/un-handled PPP header everything lands in the default 
priority class and hence no stratification. If we shape on pppoe-ge00 the 
kernel seems to do all processing before encapsulating the data with PP so all 
filters just work. In theory that should be relatively easy to fix (at least 
for the specific PPPoE case, I am unsure about a generic solution) by using 
offsets to try to access the TOS bits in PPP-packets. Also most likely we face 
the same issue in other encapsulations that pass through cerowrt to some degree 
(except most of those will use an outer IP header from where we can scratch 
DSCPs…, but I digress)

2) SQM on ge00 shows better latency under load (LUL), the LUL increases for 
~2*fq_codels target so 10ms, while SQM on pppeo-ge00 shows a LUL-increase 
(LULI) roughly twice as large or around 20ms.

        I have no idea why that is, if anybody has an idea please chime in.

3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on ge00 (with 
ingress more or less identical between the two). Also 2) and 3) do not seem to 
be coupled, artificially reducing the egress rate on pppoe-ge00 to yield the 
same egress rate as seen on ge00 does not reduce the LULI to the ge00 typical 
10ms, but it stays at 20ms.

        For this I also have no good hypothesis, any ideas?


So the current choice is either to accept a noticeable increase in LULI (but 
note some years ago even an average of 20ms most likely was rare in the real 
life) or a equally noticeable decrease in egress bandwidth… 

Best Regards
        Sebastian

P.S.: It turns out, at least on my link, that for shaping on pppoe-ge00 the 
kernel does not account for any header automatically, so I need to specify a 
per-packet-overhead (PPOH) of 40 bytes (an an ADSL2+ link with ATM linklayer); 
when shaping on ge00 however (with the kernel still terminating the PPPoE link 
to my ISP) I only need to specify an PPOH of 26 as the kernel already adds the 
14 bytes for the ethernet header…

_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Reply via email to