With this diff, I haven't had a crash yet.
Usually I plant in less than 2 minutes, with this patch, after 30
intensive minutes it still holds.

On Mon, Apr 29, 2019 at 9:24 AM David Gwynne <[email protected]> wrote:
>
> On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:
> > On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
> > > >Synopsis:    panic: Stopped at kqueue_scan
> > > >Category:    kernel i386
> > > >Environment:
> > >     System      : OpenBSD 6.5
> > >     Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> > > 19:50:46 MDT 2019
> > >              
> > > [email protected]:/usr/src/sys/arch/i386/compile/GENERIC.MP
> > >
> > >     Architecture: OpenBSD.i386
> > >     Machine     : i386
> > > >Description:
> > > Hi, since my last update I have regular panic crashes. 4 in two days.
> > > At least 3 of them, with certainty, occurred while I was accessing the
> > > Internet via my smartphone connected to my OpenBSD WiFi access point
> > > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> > > This setup works for years.
> > >
> > > The machine then displays something like:
> > > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> > > ddb{1}>
> >
> > So this indicates that the `kqueue' is empty.  It should not happen
> > because the caller, in your case npppd, always places a marker in the
> > list.
> >
> > Since the caller is not threaded and the syscall is executed with the
> > KERNEL_LOCK() held, we can supposed that another part of the kernel is
> > removing the marker.  That would imply that the other part isn't running
> > with the KERNEL_LOCK() and requires a MP kernel.
> >
> > Could you try *very hard* to reproduce the problem with a kernel built
> > with the diff below?  Hopefully you'll make it crash and we'll find the
> > bug.  Otherwise we'll look for another possible cause of the marker
> > removal.
> >
> > Index: kern/kern_event.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_event.c,v
> > retrieving revision 1.101
> > diff -u -p -r1.101 kern_event.c
> > --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000      1.101
> > +++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000
> > @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
> >       struct kqueue *kq = kn->kn_kq;
> >       struct klist *list;
> >
> > +     KERNEL_ASSERT_LOCKED();
> > +
> >       if (kn->kn_fop->f_isfd)
> >               list = &kq->kq_knlist[kn->kn_id];
> >       else
> >
>
> i had a similar diff in my tree, and with some clues from this thread
> and the same panic from jmc@, points out that tun(4) calls tun_wakeup
> without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
> the kq code messing up the kn_head list. fun fun.
>
> below is an extremely clever diff to tun to avoid doing the wakeup
> without the kernel lock held. i say extremely clever because tun(4) is
> not marked as MPSAFE, which means the if_start handler gets called with
> KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
> tun_wakeup with that implicit KERNEL_LOCK hold.
>
> i don't know why this hasn't blown up before. the network stack
> hasn't been run with the KERNEL_LOCK for ages now.
>
> tun_output with a custom if_enqueue handler would be a lot smarter,
> but that is a more invasive diff.
>
> ok?
>
> Index: net/if_tun.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if_tun.c,v
> retrieving revision 1.184
> diff -u -p -r1.184 if_tun.c
> --- net/if_tun.c        3 Feb 2019 23:04:49 -0000       1.184
> +++ net/if_tun.c        29 Apr 2019 07:14:46 -0000
> @@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu
>                 return (error);
>         }
>
> -       tun_wakeup(tp);
>         return (0);
>  }
>
> Index: kern/kern_event.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_event.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 kern_event.c
> --- kern/kern_event.c   27 Nov 2018 15:52:50 -0000      1.101
> +++ kern/kern_event.c   29 Apr 2019 07:14:46 -0000
> @@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn)
>         struct kqueue *kq = kn->kn_kq;
>         int s = splhigh();
>
> +       KERNEL_ASSERT_LOCKED();
>         KASSERT((kn->kn_status & KN_QUEUED) == 0);
>
>         TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
> @@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn)
>
>         KASSERT(kn->kn_status & KN_QUEUED);
>
> +       KERNEL_ASSERT_LOCKED();
>         TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
>         kn->kn_status &= ~KN_QUEUED;
>         kq->kq_count--;
> tun_wakeup

Reply via email to