Ok, the lockup goes away if you use no-split-gso on the cake qdiscs for the 
default traffic (noted below in the drr and hfsc cases with "!!! must use 
no-split-gso here !!!"). Only I’d like my 600 μs back. :)

This smells of a bug Toke fixed on Sep 12, 2018 in 
42e87f12ea5c390bf5eeb658c942bc810046160a, but then reverted in the next commit 
because it was fixed upstream. However, if I re-apply that commit, it still 
doesn’t fix it.

Perhaps there are more cases where skb_reset_mac_len(skb) needs to be called 
somewhere for VLAN support?

I managed to capture some output from what happens to hfsc:

[  683.864456] ------------[ cut here ]------------
[  683.869116] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1427 0xf9ced4ef()
[  683.876267] Modules linked in: cls_u32 em_meta cls_basic sch_cake(O) sch_drr 
xt_ACCOUNT(O) sch_hfsc cls_fw sch_sfq sch_prio ipt_Ra
[  683.931317] CPU: 1 PID: 11 Comm: ksoftirqd/1 Tainted: G        W  O  
3.16.7-ckt9-voyage #1
[  683.939595] Hardware name: PC Engines APU/APU, BIOS 4.0 09/08/2014
[  683.945790]  00000000 00000000 f5c8bc9c c13167e9 00000000 f5c8bcb4 c102a7dd 
f9ced4ef
[  683.953792]  f1907c00 00000000 00000000 f5c8bcc4 c102a803 00000009 00000000 
f5c8bce4
[  683.961791]  f9ced4ef f1907fc8 732494ae 00000002 f1907c00 00000000 00000040 
f5c8bd00
[  683.969783] Call Trace:
[  683.972256]  [<c13167e9>] dump_stack+0x41/0x52
[  683.976729]  [<c102a7dd>] warn_slowpath_common+0x5c/0x73
[  683.982063]  [<f9ced4ef>] ? 0xf9ced4ee
[  683.985834]  [<c102a803>] warn_slowpath_null+0xf/0x13
[  683.990905]  [<f9ced4ef>] 0xf9ced4ee
[  683.994499]  [<c129edf2>] __qdisc_run+0x81/0xf0
[  683.999052]  [<c128b655>] __dev_queue_xmit+0x23d/0x35f
[  684.004216]  [<c128b78b>] dev_queue_xmit+0xa/0xc
[  684.008857]  [<f89fff29>] register_vlan_dev+0x938/0xe3b [8021q]
[  684.014799]  [<c128b33b>] dev_hard_start_xmit+0x29e/0x37b
[  684.020223]  [<c128b6c0>] __dev_queue_xmit+0x2a8/0x35f
[  684.025381]  [<c128b78b>] dev_queue_xmit+0xa/0xc
[  684.030016]  [<c12cf8d3>] arp_xmit+0x1c/0x47
[  684.034307]  [<c12cff27>] arp_send+0x2e/0x33
[  684.038598]  [<c12d01b4>] arp_process+0x288/0x4d8
[  684.043331]  [<c12ad986>] ? ip_forward_finish+0x66/0x6b
[  684.048581]  [<c128170e>] ? __kfree_skb+0x5d/0x5f
[  684.053303]  [<c12d04ce>] arp_rcv+0xca/0x102
[  684.057597]  [<c12895dd>] __netif_receive_skb_core+0x467/0x4b6
[  684.063453]  [<c1289674>] __netif_receive_skb+0x48/0x59
[  684.068698]  [<c1289cb9>] netif_receive_skb_internal+0x59/0x85
[  684.074557]  [<c128a2cc>] napi_gro_receive+0x31/0x6d
[  684.079549]  [<c10065ec>] ? text_poke_bp+0xa0/0xa0
[  684.084369]  [<f808604a>] 0xf8086049
[  684.087974]  [<c128a0b2>] net_rx_action+0x56/0x10e
[  684.092791]  [<c102d689>] __do_softirq+0x91/0x175
[  684.097523]  [<c102d783>] run_ksoftirqd+0x16/0x29
[  684.102255]  [<c1042734>] smpboot_thread_fn+0x108/0x11e
[  684.107505]  [<c104262c>] ? SyS_setgroups+0xa6/0xa6
[  684.112403]  [<c103de80>] kthread+0x9f/0xa4
[  684.116615]  [<c1319e01>] ret_from_kernel_thread+0x21/0x30
[  684.122126]  [<c103dde1>] ? kthread_freezable_should_stop+0x40/0x40
[  684.128407] ---[ end trace cb7778967851e0ad ]---
[  684.133646] ------------[ cut here ]------------
[  684.138337] WARNING: CPU: 1 PID: 11 at net/sched/sch_hfsc.c:1427 0xf9ced4ef()
[  684.145487] Modules linked in: cls_u32 em_meta cls_basic sch_cake(O) sch_drr 
xt_ACCOUNT(O) sch_hfsc cls_fw sch_sfq sch_prio ipt_Ra
[  684.200459] CPU: 1 PID: 11 Comm: ksoftirqd/1 Tainted: G        W  O  
3.16.7-ckt9-voyage #1
[  684.208736] Hardware name: PC Engines APU/APU, BIOS 4.0 09/08/2014
[  684.214933]  00000000 00000000 f5c8be98 c13167e9 00000000 f5c8beb0 c102a7dd 
f9ced4ef
[  684.222930]  f1907c00 00000000 00000000 f5c8bec0 c102a803 00000009 00000000 
f5c8bee0
[  684.230928]  f9ced4ef f1907fc8 7364c482 00000002 f1907c00 00000000 00000040 
f5c8befc
[  684.238926] Call Trace:
[  684.241399]  [<c13167e9>] dump_stack+0x41/0x52
[  684.245870]  [<c102a7dd>] warn_slowpath_common+0x5c/0x73
[  684.251206]  [<f9ced4ef>] ? 0xf9ced4ee
[  684.254979]  [<c102a803>] warn_slowpath_null+0xf/0x13
[  684.260055]  [<f9ced4ef>] 0xf9ced4ee
[  684.263651]  [<c129edf2>] __qdisc_run+0x81/0xf0
[  684.268203]  [<c128744f>] net_tx_action+0x91/0xdd
[  684.272927]  [<c102d689>] __do_softirq+0x91/0x175
[  684.277659]  [<c102d783>] run_ksoftirqd+0x16/0x29
[  684.282389]  [<c1042734>] smpboot_thread_fn+0x108/0x11e
[  684.287633]  [<c104262c>] ? SyS_setgroups+0xa6/0xa6
[  684.292529]  [<c103de80>] kthread+0x9f/0xa4
[  684.296735]  [<c1319e01>] ret_from_kernel_thread+0x21/0x30
[  684.302246]  [<c103dde1>] ? kthread_freezable_should_stop+0x40/0x40
[  684.308536] ---[ end trace cb7778967851e0ae ]---


> On Dec 28, 2018, at 1:58 PM, Pete Heist <[email protected]> wrote:
> 
> Note that this doesn’t happen when prio is used in place of hfsc and cake is 
> used in the leafs to do the rate limiting, i.e.:
> 
> tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1 1 1 1 1 
> 1 1 1 1 1 1
> tc qdisc add dev eth0 parent 1:1 handle 10: cake besteffort bandwidth 100mbit 
> ethernet # !!! must use no-split-gso here !!!
> tc qdisc add dev eth0 parent 1:2 handle 11: cake besteffort bandwidth 100mbit 
> ethernet ether-vlan
> tc filter add dev eth0 protocol all parent 1:0 prio 1 basic match "meta(vlan 
> mask 0xfff eq 0xce4)" flowid 1:2
> tc filter add dev eth0 protocol all parent 1:0 prio 2 u32 match u32 0 0 
> flowid 1:1
> 
> But it does happen when drr is used instead of prio:
> 
> tc qdisc add dev eth0 root handle 1: drr
> tc class add dev eth0 parent 1: classid 1:1
> tc class add dev eth0 parent 1: classid 1:2
> tc qdisc add dev eth0 parent 1:1 handle 10: cake besteffort bandwidth 100mbit
> tc qdisc add dev eth0 parent 1:2 handle 11: cake besteffort bandwidth 100mbit 
> ether-vlan
> tc filter add dev eth0 protocol all parent 1:0 prio 1 basic match "meta(vlan 
> mask 0xfff eq 0xce4)" flowid 1:2
> tc filter add dev eth0 protocol all parent 1:0 prio 2 u32 match u32 0 0 
> flowid 1:1
> 
> drr might ultimately be what I want to use for this, so I can use cake to do 
> the rate limiting instead of htb. prio works well but leads to starvation 
> when the rate limit is above what the CPU can handle.
> 
> Meanwhile, using htb classes with rate limits way above the actual, then rate 
> limiting in the cake leafs, works as well, but this seems like a hack:
> 
> tc qdisc add dev eth0 root handle 1: htb default 10
> tc class add dev eth0 parent 1: classid 1:1 htb rate 10gbit
> tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5gbit
> tc class add dev eth0 parent 1:1 classid 1:11 htb rate 5gbit
> tc filter add dev eth0 protocol ip parent 1:0 prio 1 basic match "meta(vlan 
> mask 0xfff eq 0xce4)" flowid 1:11
> tc qdisc add dev eth0 parent 1:10 handle 20: cake besteffort bandwidth 
> 100mbit ethernet # !!! must use no-split-gso here !!!
> tc qdisc add dev eth0 parent 1:11 handle 21: cake besteffort bandwidth 
> 100mbit ethernet ether-vlan
> 
>> On Dec 28, 2018, at 12:30 AM, Pete Heist <[email protected]> wrote:
>> 
>> I’m seeing what I think it an infinite loop when cake is used in a one-armed 
>> router configuration with hfsc as the rate limiter. Three APUs are connected 
>> to the same switch and the “middle” APU (apu1a) routes between the default 
>> VLAN and a tagged VLAN.
>> 
>> apu2a   <— default VLAN —>   apu1a   <— VLAN 3300 —>   apu2b
>> 
>> After qos is set up, ping from apu2a to apu2b still works fine. When iperf3 
>> is run from apu2a to apu2b it works fine, but when it goes in reverse (apu2b 
>> to apu2a), all traffic stops flowing from apu1a on the default VLAN. Traffic 
>> still flows from apu1a on VLAN 3300 however, with very high RTT (mean 
>> 500ms), leading me to believe that the cake instance on the default VLAN is 
>> in an infinite loop.
>> 
>> It does not happen with hfsc+fq_codel, or with htb+cake in the same 
>> configuration.
>> 
>> Here are the commands that set up qos, and it only locks up when cake is 
>> used as the instance at handle 20, not at handle 21:
>> 
>> -----
>> tc qdisc add dev eth0 root handle 1: hfsc default 10
>> tc class add dev eth0 parent 1: classid 1:1 hfsc sc rate 200mbit ul rate 
>> 200mbit
>> tc class add dev eth0 parent 1:1 classid 1:10 hfsc sc rate 100mbit ul rate 
>> 100mbit
>> tc class add dev eth0 parent 1:1 classid 1:11 hfsc sc rate 100mbit ul rate 
>> 100mbit
>> tc filter add dev eth0 protocol ip parent 1:0 prio 1 \
>>      basic match "meta(vlan mask 0xfff eq 0xce4)" flowid 1:11
>> tc qdisc add dev eth0 parent 1:10 handle 20: fq_codel # using cake here 
>> locks up !!!
>> tc qdisc add dev eth0 parent 1:11 handle 21: cake
>> ——
>> 
>> I’m using sch_cake and tc-adv from the current HEAD, on kernel 3.16.7 (yeah, 
>> I know).
>> 
>> root@apu1a:~/qos# uname -a
>> Linux apu1a 3.16.7-ckt9-voyage #1 SMP Thu Apr 23 11:10:44 HKT 2015 i686 
>> GNU/Linux
>> 
>> Any ideas just from just this? Otherwise, I can only think to hook up the 
>> serial cable and start with the printk’s…
>> 
> 

_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

Reply via email to