Re: [Cake] cake memory consumption

Sebastian Gottschall Mon, 16 Sep 2019 06:24:27 -0700


Am 16.09.2019 um 14:00 schrieb Dave Taht:

I am puzzled as to why fq_codel_fast would use more ram than fq_codel
would, was sce (gso-splotting) enabled?

that can by typical error tollerance. he just used "free" for comparisation


similarly, the differences between hfsc and htb are interesting. I
don't get that either.

How many cake instances are being created?

according to his config, i assume 7


And for the sake of discussion, what does cake standalone consume?

thats a rare condition for my testers. this is something for PC's butnot for routers :-)

this is something i need to find out for myself on my routers


On Mon, Sep 16, 2019 at 11:22 AM Sebastian Gottschall
<[email protected]> wrote:

after we found out serious out of memory issues on smaller embedded devices 
(128 mb ram) we made some benchmarks with different schedulers
with the result that cake takes a serious amount of memory. we use the out of 
tree cake module and we use it class based since we have complex methods of 
doing qos per interface, per mac addresse or even per

I note that I often thought about having mac address functionality
might be a valuable mode for cake.

that wouldnt help. there are many variations with multiple differentsettings for different mac addresses. as far as i have seen cake is notdesigned to work like this. this is why we

have to use a class / qdisc tree in my case

ip/network. so its not just simple cake on a single interface solution. we made 
some benchmarks with different schedulers. does anybody have a solution for 
making that better?

With such complexity required I'd stick to hfsc + fq_X rather than
layer in cake.

yea. i told that too. but people complain that cake runs soooooooo muchbetter. or at least a little bit. hard to get around this argument


Understanding the model (sh -x the tc commands for, say, hfsc +
something and htb + something ) your users require, though, would be
helpful. We tried to design cake so that a jillion optimizations such
as ack prioritization, per network fq (instead per flow/per host) -
but we couldn't possibly cover all use cases in it with out more
feedback from the field.

Still... such a big difference in memory use doesn't add up. Cake has
a larger fixed memory allocation

4 mb max as i have seen. but by 7 its coming up to 28. but i still seemuch more here. consider that i implemented the same limitation tofq_codel and also fq_codel_fast

(model specific. on bigger devices i dont restrict he memory to 4 mb)

than fq_codel, but the rest is just packets which come from global memory.

Can you point to a build and a couple targets we could try? I am
presently travelling (in portugal) and won't
be back online until later this week.

what do you mean with targets? the build for testing was always thesame. i requested todo the test just with multiple schedulers which isswitchable in my gui.

what i can do is doing a tree like print to visualize how its builded(or i simple print you out the qdisc/class/filters)


the test itself was made on a tplink archer c7 v2.

HTB/FQ_CODEL ------- 62M
HTB/SFQ ------- 62M
HTB/PIE ------- 62M
HTB/FQ_CODEL_FAST ------- 67M
HTB/CAKE -------111M

HFSC/FQ_CODEL_FAST ------- 47M
HTB/PIE ------- 49M
HTB/SFQ ------- 50M
HFSC /FQ_CODEL ------- 52M
HFSC/CAKE -------109M


consider that the benchmark doesnt show the real values. its system overall and 
does not consider memory taken by the wireless driver for instance which is 
about 45 mb of ram for ath10k
so this makes all even more worse unfortunatly since there is not that many ram 
left for cake. just about 70mb maybe.
Am 08.09.2019 um 19:27 schrieb Jonathan Morton:

You could also set it back to 'internet' and progressively reduce the
bandwidth parameter, making the Cake shaper into the actual bottleneck.
This is the correct fix for the problem, and you should notice an
instant improvement as soon as the bandwidth parameter is correct.

Hand tuning this one link is not a problem. I'm searching for a set of settings 
that will provide generally good performance across a wide range of devices, 
links, and situations.

 From what you've indicated so far there's nothing as effective as a correct 
bandwidth estimation if we consider the antenna (link) a black box. Expecting 
the user to input expected throughput for every link and then managing that 
information is essentially a non-starter.

Radio tuning provides some improvement, but until ubiquiti starts shipping with 
Codel on non-router devices I don't think there's a good solution here.

Any way to have the receiving device detect bloat and insert an ECN?

That's what the qdisc itself is supposed to do.

I don't think the time spent in the intermediate device is detectable at the 
kernel level but we keep track of latency for routing decisions and could 
detect bloat with some accuracy, the problem is how to respond.

As long as you can detect which link the bloat is on (and in which direction), 
you can respond by reducing the bandwidth parameter on that half-link by a 
small amount.  Since you have a cooperating network, maintaining a time 
standard on each node sufficient to observe one-way delays seems feasible, as 
is establishing a normal baseline latency for each link.

The characteristics of the bandwidth parameter being too high are easy to 
observe.  Not only will the one-way delay go up, but the received throughput in 
the same direction at the same time will be lower than configured.  You might 
use the latter as a hint as to how far you need to reduce the shaped bandwidth.

Deciding when and by how much to *increase* bandwidth, which is presumably 
desirable when link conditions improve, is a more difficult problem when the 
link hardware doesn't cooperate by informing you of its status.  (This is 
something you could reasonably ask Ubiquiti to address.)

I would assume that link characteristics will change slowly, and run an 
occasional explicit bandwidth probe to see if spare bandwidth is available.  If 
that probe comes through without exhibiting bloat, *and* the link is otherwise 
loaded to capacity, then increase the shaper by an amount within the probe's 
capacity of measurement - and schedule a repeat.

A suitable probe might be 100x 1500b packets paced out over a second, bypassing 
the shaper.  This will occupy just over 1Mbps of bandwidth, and can be expected 
to induce 10ms of delay if injected into a saturated 100Mbps link.  Observe the 
delay experienced by each packet *and* the quantity of other traffic that 
appears between them.  Only if both are favourable can you safely open the 
shaper, by 1Mbps.

Since wireless links can be expected to change their capacity over time, due to eg. 
weather and tree growth, this seems to be more generally useful than a static guess.  You 
could deploy a new link with a conservative "guess" of say 10Mbps, and just 
probe from there.

  - Jonathan Morton
_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

Re: [Cake] cake memory consumption

Reply via email to