Hi,
just to document my current understanding of using SQM on a router that also
terminates a pppoe wan connection. We basically have two options either set up
SQM on the real interface (let’s call it ge00 like cerowrt does) or on the
associated pop device, pppoe-ge00. In theory both should produce the same
results; in praxis current SQM has significant different results. Let me
enumerate the main differences that show up when testing with netperf-wrapper’s
RRUL test:
1) SQM on ge00 does not show a working egress classification in the RRUL test
(no visible “banding”/stratification of the 4 different priority TCP flows),
while SQM on pppoe-ge00 does show this stratification.
Now the reason for this is quite obvious once we take into account that
on ge00 the kernel sees a packet that already contains a PPP header between
ethernet and IP header and has a different ether_type field, and our diffserv
filters currently ignore everything except straight ipv4 and ipv6 packets, so
due to the unexpected/un-handled PPP header everything lands in the default
priority class and hence no stratification. If we shape on pppoe-ge00 the
kernel seems to do all processing before encapsulating the data with PP so all
filters just work. In theory that should be relatively easy to fix (at least
for the specific PPPoE case, I am unsure about a generic solution) by using
offsets to try to access the TOS bits in PPP-packets. Also most likely we face
the same issue in other encapsulations that pass through cerowrt to some degree
(except most of those will use an outer IP header from where we can scratch
DSCPs…, but I digress)
2) SQM on ge00 shows better latency under load (LUL), the LUL increases for
~2*fq_codels target so 10ms, while SQM on pppeo-ge00 shows a LUL-increase
(LULI) roughly twice as large or around 20ms.
I have no idea why that is, if anybody has an idea please chime in.
3) SQM on pppoe-ge00 has a rough 20% higher egress rate than SQM on ge00 (with
ingress more or less identical between the two). Also 2) and 3) do not seem to
be coupled, artificially reducing the egress rate on pppoe-ge00 to yield the
same egress rate as seen on ge00 does not reduce the LULI to the ge00 typical
10ms, but it stays at 20ms.
For this I also have no good hypothesis, any ideas?
So the current choice is either to accept a noticeable increase in LULI (but
note some years ago even an average of 20ms most likely was rare in the real
life) or a equally noticeable decrease in egress bandwidth…
Best Regards
Sebastian
P.S.: It turns out, at least on my link, that for shaping on pppoe-ge00 the
kernel does not account for any header automatically, so I need to specify a
per-packet-overhead (PPOH) of 40 bytes (an an ADSL2+ link with ATM linklayer);
when shaping on ge00 however (with the kernel still terminating the PPPoE link
to my ISP) I only need to specify an PPOH of 26 as the kernel already adds the
14 bytes for the ethernet header…
_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel