On 29/04/19(Mon) 17:24, David Gwynne wrote:
> On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:
> > On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
> > > >Synopsis:    panic: Stopped at kqueue_scan
> > > >Category:    kernel i386
> > > >Environment:
> > >     System      : OpenBSD 6.5
> > >     Details     : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> > > 19:50:46 MDT 2019
> > >              
> > > [email protected]:/usr/src/sys/arch/i386/compile/GENERIC.MP
> > > 
> > >     Architecture: OpenBSD.i386
> > >     Machine     : i386
> > > >Description:
> > > Hi, since my last update I have regular panic crashes. 4 in two days.
> > > At least 3 of them, with certainty, occurred while I was accessing the
> > > Internet via my smartphone connected to my OpenBSD WiFi access point
> > > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> > > This setup works for years.
> > > 
> > > The machine then displays something like:
> > > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> > > ddb{1}>
> > 
> > So this indicates that the `kqueue' is empty.  It should not happen
> > because the caller, in your case npppd, always places a marker in the
> > list.
> > 
> > Since the caller is not threaded and the syscall is executed with the
> > KERNEL_LOCK() held, we can supposed that another part of the kernel is
> > removing the marker.  That would imply that the other part isn't running
> > with the KERNEL_LOCK() and requires a MP kernel.
> > 
> > Could you try *very hard* to reproduce the problem with a kernel built
> > with the diff below?  Hopefully you'll make it crash and we'll find the
> > bug.  Otherwise we'll look for another possible cause of the marker
> > removal.
> > 
> > Index: kern/kern_event.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_event.c,v
> > retrieving revision 1.101
> > diff -u -p -r1.101 kern_event.c
> > --- kern/kern_event.c       27 Nov 2018 15:52:50 -0000      1.101
> > +++ kern/kern_event.c       28 Apr 2019 21:47:25 -0000
> > @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
> >     struct kqueue *kq = kn->kn_kq;
> >     struct klist *list;
> >  
> > +   KERNEL_ASSERT_LOCKED();
> > +
> >     if (kn->kn_fop->f_isfd)
> >             list = &kq->kq_knlist[kn->kn_id];
> >     else
> > 
> 
> i had a similar diff in my tree, and with some clues from this thread
> and the same panic from jmc@, points out that tun(4) calls tun_wakeup
> without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
> the kq code messing up the kn_head list. fun fun.
> 
> below is an extremely clever diff to tun to avoid doing the wakeup
> without the kernel lock held. i say extremely clever because tun(4) is
> not marked as MPSAFE, which means the if_start handler gets called with
> KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
> tun_wakeup with that implicit KERNEL_LOCK hold.
> 
> i don't know why this hasn't blown up before. the network stack
> hasn't been run with the KERNEL_LOCK for ages now.
> 
> tun_output with a custom if_enqueue handler would be a lot smarter,
> but that is a more invasive diff.

I'd prefer if we could grab the KERNEK_LOCK() around csignal() and
selwakeup() like it is done in the socket code.  I believe that we
need to show where the lock is taken in order to help people realize
where it needs to be pushed down.

Can we solve both issues with a similar fix?  How hard can it be to get
selwakeup() out of the KERNEL_LOCK()?

Should we add a KASSERT in csignal() too?

> Index: net/if_tun.c
> ===================================================================
> RCS file: /cvs/src/sys/net/if_tun.c,v
> retrieving revision 1.184
> diff -u -p -r1.184 if_tun.c
> --- net/if_tun.c      3 Feb 2019 23:04:49 -0000       1.184
> +++ net/if_tun.c      29 Apr 2019 07:14:46 -0000
> @@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu
>               return (error);
>       }
>  
> -     tun_wakeup(tp);
>       return (0);
>  }
>  
> Index: kern/kern_event.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_event.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 kern_event.c
> --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000      1.101
> +++ kern/kern_event.c 29 Apr 2019 07:14:46 -0000
> @@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn)
>       struct kqueue *kq = kn->kn_kq;
>       int s = splhigh();
>  
> +     KERNEL_ASSERT_LOCKED();
>       KASSERT((kn->kn_status & KN_QUEUED) == 0);
>  
>       TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
> @@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn)
>  
>       KASSERT(kn->kn_status & KN_QUEUED);
>  
> +     KERNEL_ASSERT_LOCKED();
>       TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe);
>       kn->kn_status &= ~KN_QUEUED;
>       kq->kq_count--;
> tun_wakeup

Reply via email to