On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote: > On 23/04/19(Tue) 12:16, Olivier Antoine wrote: > > >Synopsis: panic: Stopped at kqueue_scan > > >Category: kernel i386 > > >Environment: > > System : OpenBSD 6.5 > > Details : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21 > > 19:50:46 MDT 2019 > > > > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP > > > > Architecture: OpenBSD.i386 > > Machine : i386 > > >Description: > > Hi, since my last update I have regular panic crashes. 4 in two days. > > At least 3 of them, with certainty, occurred while I was accessing the > > Internet via my smartphone connected to my OpenBSD WiFi access point > > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor. > > This setup works for years. > > > > The machine then displays something like: > > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e > > kernel: page fault trap, code=0 > > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx) > > ddb{1}> > > So this indicates that the `kqueue' is empty. It should not happen > because the caller, in your case npppd, always places a marker in the > list. > > Since the caller is not threaded and the syscall is executed with the > KERNEL_LOCK() held, we can supposed that another part of the kernel is > removing the marker. That would imply that the other part isn't running > with the KERNEL_LOCK() and requires a MP kernel. > > Could you try *very hard* to reproduce the problem with a kernel built > with the diff below? Hopefully you'll make it crash and we'll find the > bug. Otherwise we'll look for another possible cause of the marker > removal. > > Index: kern/kern_event.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_event.c,v > retrieving revision 1.101 > diff -u -p -r1.101 kern_event.c > --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101 > +++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000 > @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc > struct kqueue *kq = kn->kn_kq; > struct klist *list; > > + KERNEL_ASSERT_LOCKED(); > + > if (kn->kn_fop->f_isfd) > list = &kq->kq_knlist[kn->kn_id]; > else >
i had a similar diff in my tree, and with some clues from this thread and the same panic from jmc@, points out that tun(4) calls tun_wakeup without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in the kq code messing up the kn_head list. fun fun. below is an extremely clever diff to tun to avoid doing the wakeup without the kernel lock held. i say extremely clever because tun(4) is not marked as MPSAFE, which means the if_start handler gets called with KERNEL_LOCK taken by the stack on its behalf. tun_start then calls tun_wakeup with that implicit KERNEL_LOCK hold. i don't know why this hasn't blown up before. the network stack hasn't been run with the KERNEL_LOCK for ages now. tun_output with a custom if_enqueue handler would be a lot smarter, but that is a more invasive diff. ok? Index: net/if_tun.c =================================================================== RCS file: /cvs/src/sys/net/if_tun.c,v retrieving revision 1.184 diff -u -p -r1.184 if_tun.c --- net/if_tun.c 3 Feb 2019 23:04:49 -0000 1.184 +++ net/if_tun.c 29 Apr 2019 07:14:46 -0000 @@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu return (error); } - tun_wakeup(tp); return (0); } Index: kern/kern_event.c =================================================================== RCS file: /cvs/src/sys/kern/kern_event.c,v retrieving revision 1.101 diff -u -p -r1.101 kern_event.c --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101 +++ kern/kern_event.c 29 Apr 2019 07:14:46 -0000 @@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn) struct kqueue *kq = kn->kn_kq; int s = splhigh(); + KERNEL_ASSERT_LOCKED(); KASSERT((kn->kn_status & KN_QUEUED) == 0); TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); @@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn) KASSERT(kn->kn_status & KN_QUEUED); + KERNEL_ASSERT_LOCKED(); TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); kn->kn_status &= ~KN_QUEUED; kq->kq_count--; tun_wakeup