On Sat, May 20, 2017 at 17:30 +0200, Sebastien Marie wrote: > Hi, > > Yesterday, I upgraded my amd64 gateway (pppoe + vlans) with a recent > -current in order to test a bit codel queue. > > OpenBSD 6.1-current (GENERIC.MP) #67: Thu May 18 18:28:26 MDT 2017 > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > This first panic occured while running (panic after 1 day): > > [handwritten ddb output: I stripped out pointer addresses] > > panic: kernel diagnostic assertion "ifq_is_serialized(ifq)" failed: file > "/usr/src/sys/net/ifq.c" line 394 > Stopped at db_enter+0x9: leave > TID PID UID PRFLAGS PFLAGS CPU COMMAND > *208580 11878 0 0x14000 0x200 0 softnet > 62492 51658 0 0x14000 0x200 1 systq > db_enter(...) at db_enter+0x9 > panic(...) at panic+0x102 > __assert(...) at __assert+0x35 > ifq_mfreeml(...) at ifq_mfreeml+0x6c > fq_codel_deq_begin(x,x,x,x,7,x) at fq_codel_deq_begin+0x1d9 > ifq_deq_begin(x,7,x,0,x,x) at ifq_deq_begin+0x45 > ifq_dequeue(x,x,x,0,x,x) at ifq_dequeue+0x1c > sppp_dequeue(x,x,c,x,x,x) at sppp_dequeue+0x6f > pppoe_start(x,285,x,x,7,x) at pppoe_start+0xbf > if_qstart_compat(x,x,c,5,x,20) at if_qstart_compat+0x3f > sppp_cp_send(x,8021,1,e,6,x) at sppp_cp_send+0x202 > sppp_ipcp_scr(x,x,1,x,x,x) at sppp_ipcp_scr+0x97 > sppp_phase_network(x,f0,x,5,0,x) at sppp_phase_network+0xad > sppp_pap_input(x,x,0,x,0,x) at sppp_pap_input+0x33f > endtrace frame: 0xffff8000131a6dc0, count: 0 > > I kept the photography of the screen if pointer addresses are in > interest (`x' inside the trace). > > > > # ifconfig pppoe0 > pppoe0: flags=208851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST,AUTOCONF6> mtu > 1492 > index 17 priority 0 llprio 3 > dev: vlan10 state: session > sid: 0x3fad PADI retries: 1 PADR retries: 0 time: 01:34:11 > sppp: phase network authproto pap authname "XXX" > groups: pppoe internet egress > status: active > inet6 fe80::cabe:19ff:fee2:2ced%pppoe0 -> prefixlen 64 scopeid 0x11 > inet 109.190.33.218 --> 178.32.37.4 netmask 0xffffffff > > # ifconfig vlan10 > vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> rdomain 10 mtu 1500 > lladdr c8:be:19:e2:2c:ed > index 7 priority 0 llprio 3 > vlan: 10 parent interface: re0 > vnetid: 10 > parent: re0 > groups: vlan adsl > status: active > inet 192.168.10.1 netmask 0xffffff00 broadcast 192.168.10.255 > inet 192.168.1.10 netmask 0xffffff00 broadcast 192.168.1.255 > > # ifconfig re0 > re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1504 > lladdr c8:be:19:e2:2c:ed > index 1 priority 0 llprio 3 > media: Ethernet autoselect (1000baseT full-duplex) > status: active > > > From my pf configuration, I had removed queue with bandwidth and > replaced them with: > > queue fq on pppoe0 flows 1024 default > queue fq on re0 flows 1024 default > > Thanks. > -- > Sebastien Marie
Hi, you've managed to find and open a big can of worms :-) The reason for the panic is that fq-codel relies on features provided by the new queueing interface and it would be counter- productive to shove in a compatibility layer. The new queueing interface is used when an interface defines if_qstart callback instead of an old if_start. The good news is that the compat shim that we have if_qstart_compat can be made to call into ifq_serialize or do something similar since under KERNEL_LOCK your ifq is serialized by definition. Regarding your configuration, I'd advise you to use hfsc (the default queueing) on pppoe0. Set it to your *uplink* bandwidth: queue pppoe0-rootq on pppoe0 bandwidth 2M max 2M default And use the flow queue only on the underlying interface. Since this is what I've been advising folks to use, I didn't manage to find this bug beforehands. Apologies for that. I'll try to come up with the diff to do ifq_serialize for if_qstart_compat.
