Am 22.08.2019 um 19:03 schrieb Dave Taht:
Sebastian Gottschall <[email protected]> writes:

Am 22.08.2019 um 15:15 schrieb Dave Taht:
It's very good to know how much folk have been struggling to keep
things from OOMing on 32MB platforms. I'd like to hope that the
unified memory management in cake (vs a collection of QoS qdiscs) and
the new fq_codel for wifi stuff (cutting it down to 1 alloc from four)
help, massively on this issue, but until today I was unaware of how
much the field may have been patching things out.

The default 32MB memory limits in fq_codel comes from the stressing
about 10GigE networking from google. 4MB is limit in openwrt,
which is suitable for ~1Gbit, and is sort of there  due to 802.11ac's
maximum (impossible to hit) of a txop that large.
I did kind of conflate "qos + fq_codel" vs wifi in this message. It
looks like yer staying with me.

Something as small as 256K is essentially about 128 full size packets
(and often, acks from an ethernet device's rx ring eat 2k).
what i miss in mac80211 is the following option "fq_codel = off"
its essential and i will definitly work on a patch to deal with this
way for low memory 802.11n platforms.
Well, it would be my hope that turning it off would A) not help that
much on memory or cpu and B) show such a dramatic reduction in
multi-station performance that you'd immediately turn it on again.
isnt it better to have a working platform with less performance than a crashing platform with no performance? i mean i can user older mac80211 versions without that issue on a typical nanostation 2/5 which is often used just as CPE device

but with current mac80211 versions (current means last 2-3 years). they are just unstable and running out of memory after a while the only thing which helped was cutting of the memory limit of fq_codel inside mac80211 i also have another fancy testunit which is a linksys wrt400 with 32 mb ram and 2 ath9k based wifi chipsets. no hope here for running stable for only 5 minutes even with a single connection under load (my crashing test is running a hdtv iptv stream converted to unicast using a stateless eoip tunnel)

I try to encourage folk to run the rtt_fair tests in flent when
twiddling with wifi. Those really shows how bad things are when you
don't have ATF + FQ + Per station aggregation and lots of
clients. Single threaded tests are misleading.
i know but even single threaded tests arent working good on such devices. so there is no need to talk about the benefits of atf,fq_codel etc. but there is need to talk about configurable use of it which also allows to disable it if required. if you just have a cpe device with pppoe running on it which is common for wisps there is no need for much fair queuing. this is a task for the accesspoint. another typical use for such devices like nanostation, rocket, bullet etc. are simple point to point long range links. this is the main use for such high gain devices like these is my assumption. so we dont talk about a typical cool and fancy ab. we talk about compatibility with low end devices without running out of resources. i'm a typical programmer from the 80s. keep it small, simple and resource efficient as possible. these coding standards should still be considered today even if i dont write tetris clones anymore running on 512 byte boot sectors using the msdos builtin debug assembler program

I gave a good demo of why this is (was!), here: 
https://www.youtube.com/watch?v=Rb-UnHDw02o&t=1551s

and there's more in the ending the anomaly paper. Perversely though,
now that we can do 25x latency reductions and 2.5x more throughput,
more memory is needed to achieve those goals in some cases, which
is part of my concern about chopping things down to 256k here.
The structure of the new fq_codel for wifi subsystem is "one in the
hardware, one ready to go, and the rest accumulating". I
typically see about 13-20 packets in an aggregate. 256k strikes me as
a bit small.
from the rules its that 256 is used for ht only and if vht is involved
the limit of 4mb is used.
but now comes the point. all 802.11ac platforms having 64mb ram or
more. but ath10k chipsets are using
about 40 mb of shared memory. so mmh we are hitting the wall
again. most routers have 128 mb with 802.11ac, but some (noticable
dlink) have just 64mb
Ugh.

Is it just the mips boxes with so little ram? All the arm routers I have
have at least 128, some as much as 512.
you got it. all the mips routers. most problematic the tplink wr841 (and similar series) and ubnt devices of course. these are 802.11 but just comming with 32 mb ram. but there are others too of course and i love to maintain all the older devices for the community. for newer arm based devices we really dont need to care about. broadcom arm cpus are comming with chipsets which are not supported by linux/mac80211 anyway or just bad supported for newer chipsets using brcmfmac. (but the original broadcom propertiery driver is unstable too of course) and all other models based on qca ipq8064 etc. are comming with 256 mb and more and we really only need to take care about ath9k and ath10k (soon maybe ath11k) everything else doesnt matter. the linksys wrtXXXX series has a mac80211 driver, but marvell stopped maintaining it at a point where it still was shit and unstable. and its mainly based on a binary firmware blob.



Yes, having a wifi chip that can theoretically have 4MB in transit
with so little ram is problematic.

Dear dlink: don't do that. It hurts when you do that.

i talked alot with dlink about this issue, but dlinks solution was just switching to a cheaper mediatek mips based platform. now we have more ram, but a featureless chipset.
same for tplink.
I haven't checked, but does this patch still exist in openwrt/dd-wrt?
It had helped a lot when under memory pressure from
a lot of small packets.

https://github.com/dtaht/cerowrt-3.10/blob/master/target/linux/generic/patches-3.10/657-qdisc_reduce_truesize.patch

Arguably this could be made more aggressive, but it massively reduced
memory burdens at the time I did it when
flooding the device, or having lots of acks, and while it cost cpu it
saved on ooming.
mmh let me check -> nope its at least not in my tree. but will be soon :-)
Well, I sent along a mildly improved version of the idea.

I can really see some sort of "test my qos" script that attempts
to flood every queue on the system. And wider adoption of
cake which is lighter weight than the alterntives.

one idea that's in cake was that: we'd hoped to capture the most typical
qos setups with it with "models". It's very easy to add a new model
(besteffort, diffserv3, diffserv4) (it's a lookup table and bandwidth
allocation call), but lacking feedback on more typical QoS constructs
from the field, that's where it ended. When we started the project,
I figured we'd end up with 20+ models before the end.

It would be good to get a tc class dump or output from more typical
QoS Setups.

In sqm and cake...
we have a terrible tendency to tell people "no, just use the defaults!
they work! trust us!"...
yeah i know that feeling .but i can never trust the users. the always do what they think is good for them and everyone thinks he knows better since he was reading something using google / reddit

who generally don't believe us and want to keep doing things the
way they always have.

In more than a few circumstances they are right, but we don't understand
what they are trying to do.

As one case that cake doesn't handle, at least some iptv setups are
visible as a strict priority queue over everything else, below which you
do everything else, so the tv stream never, ever, drops a packet.
as i mentioned before. my solition for iptv is layer 2 tunneling to get rid of multicast issues and it also converts everthing to a single connection. i use a rfc compliant ether over ip tunnel for this which is not upstream in linux, but in freebsd. but there was a driver for kernel 2.4 around many years ago and i maintained it up to the latest kernel. its robust, handles fragmentation and just has 12 bytes overhead.

We didn't do that, but could *easily* add an iptv model to shape
inbound better - if we knew more about how free, FT etc, construct
their packets.

inbound they are marked with tos. typical internet has 0 of course. iptv has X and voice has Y. (dont ask me for the numbers, i dont have them in mind right now) but for dhcp leases you need to mark your own packets with another dscp. otherwise the isp returns no ip. i dont know why this has been made. but it has to be handled. normally orange ships black boxes as routers and to get it working with free systems, some people reverse engineered that shit. my conclusion is its some sort of obfuscation to avoid third party hardware since the EU regulated the ISP's in a way that they got forced to allow 3rd party products which they still try to avoid. (refusing support for internet problems etc.)

Similarly some folk in this world want strict priority for EF.

There's two other dubious things in the fq_codel for wifi stack
presently. Right now the codel target is set too high for p2p use
(20ms, where 6ms seems more right), and it also flips up to a really
high target and interval AND turns off ecn when there's more than a
few stations available (rather than "active") - it's an overly
conservative figure we used back when we had major issues with
powersave
and multicast that I'd hoped we could cut back to normal after we got
another round of research funding and feedback from the field (which
didn't happen, and we never got around to making it configurable, and
being 25x better than it was before seemed "enough")

I was puzzled at battlemesh as to why I had dropping at about 50ms
delay rather than ecn, and thought it was something
else, and this morning I'm thinking that folk have been reducing the
memlimit to 256k rather....

_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

Reply via email to