On Sun, May 21, 2017 at 11:53 +0200, Mike Belopuhov wrote:
> On Sat, May 20, 2017 at 18:53 +0200, Mike Belopuhov wrote:
> > On Sat, May 20, 2017 at 17:30 +0200, Sebastien Marie wrote:
> > > Hi,
> > > 
> > > Yesterday, I upgraded my amd64 gateway (pppoe + vlans) with a recent
> > > -current in order to test a bit codel queue.
> > > 
> > > OpenBSD 6.1-current (GENERIC.MP) #67: Thu May 18 18:28:26 MDT 2017
> > >     dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > > 
> > > This first panic occured while running (panic after 1 day):
> > > 
> > > [handwritten ddb output: I stripped out pointer addresses]
> > > 
> > > panic: kernel diagnostic assertion "ifq_is_serialized(ifq)" failed: file 
> > > "/usr/src/sys/net/ifq.c" line 394
> > > Stopped at        db_enter+0x9:   leave
> > >   TID     PID     UID     PRFLAGS PFLAGS  CPU     COMMAND
> > >     *208580       11878   0       0x14000 0x200   0       softnet
> > >       62492       51658   0       0x14000 0x200   1       systq
> > > db_enter(...) at db_enter+0x9
> > > panic(...) at panic+0x102
> > > __assert(...) at __assert+0x35
> > > ifq_mfreeml(...) at ifq_mfreeml+0x6c
> > > fq_codel_deq_begin(x,x,x,x,7,x) at fq_codel_deq_begin+0x1d9
> > > ifq_deq_begin(x,7,x,0,x,x) at ifq_deq_begin+0x45
> > > ifq_dequeue(x,x,x,0,x,x) at ifq_dequeue+0x1c
> > > sppp_dequeue(x,x,c,x,x,x) at sppp_dequeue+0x6f
> > > pppoe_start(x,285,x,x,7,x) at pppoe_start+0xbf
> > > if_qstart_compat(x,x,c,5,x,20) at if_qstart_compat+0x3f
> > > sppp_cp_send(x,8021,1,e,6,x) at sppp_cp_send+0x202
> > > sppp_ipcp_scr(x,x,1,x,x,x) at sppp_ipcp_scr+0x97
> > > sppp_phase_network(x,f0,x,5,0,x) at sppp_phase_network+0xad
> > > sppp_pap_input(x,x,0,x,0,x) at sppp_pap_input+0x33f
> > > endtrace frame: 0xffff8000131a6dc0, count: 0
> > > 
> > > I kept the photography of the screen if pointer addresses are in
> > > interest (`x' inside the trace).
> > > 
> > > 
> > > 
> > > # ifconfig pppoe0
> > > pppoe0: flags=208851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST,AUTOCONF6> 
> > > mtu 1492
> > >         index 17 priority 0 llprio 3
> > >         dev: vlan10 state: session
> > >         sid: 0x3fad PADI retries: 1 PADR retries: 0 time: 01:34:11
> > >         sppp: phase network authproto pap authname "XXX"
> > >         groups: pppoe internet egress
> > >         status: active
> > >         inet6 fe80::cabe:19ff:fee2:2ced%pppoe0 ->  prefixlen 64 scopeid 
> > > 0x11
> > >         inet 109.190.33.218 --> 178.32.37.4 netmask 0xffffffff
> > > 
> > > # ifconfig vlan10
> > > vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> rdomain 10 mtu 
> > > 1500
> > >         lladdr c8:be:19:e2:2c:ed
> > >         index 7 priority 0 llprio 3
> > >         vlan: 10 parent interface: re0
> > >         vnetid: 10
> > >         parent: re0
> > >         groups: vlan adsl
> > >         status: active
> > >         inet 192.168.10.1 netmask 0xffffff00 broadcast 192.168.10.255
> > >         inet 192.168.1.10 netmask 0xffffff00 broadcast 192.168.1.255
> > > 
> > > # ifconfig re0
> > > re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1504
> > >         lladdr c8:be:19:e2:2c:ed
> > >         index 1 priority 0 llprio 3
> > >         media: Ethernet autoselect (1000baseT full-duplex)
> > >         status: active
> > > 
> > > 
> > > From my pf configuration, I had removed queue with bandwidth and
> > > replaced them with:
> > > 
> > > queue fq on pppoe0 flows 1024 default
> > > queue fq on re0    flows 1024 default
> > > 
> > > Thanks.
> > > -- 
> > > Sebastien Marie
> > 
> > Hi, you've managed to find and open a big can of worms :-)
> > The reason for the panic is that fq-codel relies on features
> > provided by the new queueing interface and it would be counter-
> > productive to shove in a compatibility layer.  The new queueing
> > interface is used when an interface defines if_qstart callback
> > instead of an old if_start.  The good news is that the compat
> > shim that we have if_qstart_compat can be made to call into
> > ifq_serialize or do something similar since under KERNEL_LOCK
> > your ifq is serialized by definition.
> > 
> > Regarding your configuration, I'd advise you to use hfsc (the
> > default queueing) on pppoe0.  Set it to your *uplink* bandwidth:
> > 
> >   queue pppoe0-rootq on pppoe0 bandwidth 2M max 2M default
> > 
> > And use the flow queue only on the underlying interface.
> > 
> > Since this is what I've been advising folks to use, I didn't
> > manage to find this bug beforehands.  Apologies for that.
> > 
> > I'll try to come up with the diff to do ifq_serialize for
> > if_qstart_compat.
> 
> This turns out to be a rather contained problem.  if_enqueue
> is already going down the ifq_start -> ifq_serialize ->
> ifq_start_task -> if_qstart_compat chain and sppp code is
> using if_enqueue.  But it also calls if_start, which means
> that instead of calling if_qstart_compat directly from there,
> we need to let ifq_start do its magic.  I've got this tested
> at home.  OK?
>

Sebastien, can you please confirm that this solves your problem?
David, does this look reasonable to you?

> diff --git sys/net/if.c sys/net/if.c
> index 34fd1aa84d7..acea3ae53ea 100644
> --- sys/net/if.c
> +++ sys/net/if.c
> @@ -623,12 +623,13 @@ if_attach_ifq(struct ifnet *ifp, const struct ifq_ops 
> *newops, void *args)
>  
>  void
>  if_start(struct ifnet *ifp)
>  {
>       KASSERT(ifp->if_qstart == if_qstart_compat);
> -     if_qstart_compat(&ifp->if_snd);
> +     ifq_start(&ifp->if_snd);
>  }
> +
>  void
>  if_qstart_compat(struct ifqueue *ifq)
>  {
>       struct ifnet *ifp = ifq->ifq_if;
>       int s;

Reply via email to