With this diff, I haven't had a crash yet. Usually I plant in less than 2 minutes, with this patch, after 30 intensive minutes it still holds.
On Mon, Apr 29, 2019 at 9:24 AM David Gwynne <[email protected]> wrote: > > On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote: > > On 23/04/19(Tue) 12:16, Olivier Antoine wrote: > > > >Synopsis: panic: Stopped at kqueue_scan > > > >Category: kernel i386 > > > >Environment: > > > System : OpenBSD 6.5 > > > Details : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21 > > > 19:50:46 MDT 2019 > > > > > > [email protected]:/usr/src/sys/arch/i386/compile/GENERIC.MP > > > > > > Architecture: OpenBSD.i386 > > > Machine : i386 > > > >Description: > > > Hi, since my last update I have regular panic crashes. 4 in two days. > > > At least 3 of them, with certainty, occurred while I was accessing the > > > Internet via my smartphone connected to my OpenBSD WiFi access point > > > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor. > > > This setup works for years. > > > > > > The machine then displays something like: > > > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e > > > kernel: page fault trap, code=0 > > > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx) > > > ddb{1}> > > > > So this indicates that the `kqueue' is empty. It should not happen > > because the caller, in your case npppd, always places a marker in the > > list. > > > > Since the caller is not threaded and the syscall is executed with the > > KERNEL_LOCK() held, we can supposed that another part of the kernel is > > removing the marker. That would imply that the other part isn't running > > with the KERNEL_LOCK() and requires a MP kernel. > > > > Could you try *very hard* to reproduce the problem with a kernel built > > with the diff below? Hopefully you'll make it crash and we'll find the > > bug. Otherwise we'll look for another possible cause of the marker > > removal. > > > > Index: kern/kern_event.c > > =================================================================== > > RCS file: /cvs/src/sys/kern/kern_event.c,v > > retrieving revision 1.101 > > diff -u -p -r1.101 kern_event.c > > --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101 > > +++ kern/kern_event.c 28 Apr 2019 21:47:25 -0000 > > @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc > > struct kqueue *kq = kn->kn_kq; > > struct klist *list; > > > > + KERNEL_ASSERT_LOCKED(); > > + > > if (kn->kn_fop->f_isfd) > > list = &kq->kq_knlist[kn->kn_id]; > > else > > > > i had a similar diff in my tree, and with some clues from this thread > and the same panic from jmc@, points out that tun(4) calls tun_wakeup > without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in > the kq code messing up the kn_head list. fun fun. > > below is an extremely clever diff to tun to avoid doing the wakeup > without the kernel lock held. i say extremely clever because tun(4) is > not marked as MPSAFE, which means the if_start handler gets called with > KERNEL_LOCK taken by the stack on its behalf. tun_start then calls > tun_wakeup with that implicit KERNEL_LOCK hold. > > i don't know why this hasn't blown up before. the network stack > hasn't been run with the KERNEL_LOCK for ages now. > > tun_output with a custom if_enqueue handler would be a lot smarter, > but that is a more invasive diff. > > ok? > > Index: net/if_tun.c > =================================================================== > RCS file: /cvs/src/sys/net/if_tun.c,v > retrieving revision 1.184 > diff -u -p -r1.184 if_tun.c > --- net/if_tun.c 3 Feb 2019 23:04:49 -0000 1.184 > +++ net/if_tun.c 29 Apr 2019 07:14:46 -0000 > @@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu > return (error); > } > > - tun_wakeup(tp); > return (0); > } > > Index: kern/kern_event.c > =================================================================== > RCS file: /cvs/src/sys/kern/kern_event.c,v > retrieving revision 1.101 > diff -u -p -r1.101 kern_event.c > --- kern/kern_event.c 27 Nov 2018 15:52:50 -0000 1.101 > +++ kern/kern_event.c 29 Apr 2019 07:14:46 -0000 > @@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn) > struct kqueue *kq = kn->kn_kq; > int s = splhigh(); > > + KERNEL_ASSERT_LOCKED(); > KASSERT((kn->kn_status & KN_QUEUED) == 0); > > TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); > @@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn) > > KASSERT(kn->kn_status & KN_QUEUED); > > + KERNEL_ASSERT_LOCKED(); > TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); > kn->kn_status &= ~KN_QUEUED; > kq->kq_count--; > tun_wakeup
