On 28/10/21(Thu) 05:45, Visa Hankala wrote:
> On Wed, Oct 27, 2021 at 09:02:08PM -0400, Dave Voutila wrote:
> >
> > Dave Voutila <[email protected]> writes:
> >
> > > Was tinkering on a bt(5) script for trying to debug an issue in vmm(4)
> > > when I managed to start hitting a panic "wakeup: p_stat is 2" being
> > > triggered by kqueue coming from the softnet kernel task.
> > >
> > > I'm running an amd64 kernel built from the tree today (latest CVS commit
> > > id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults from
> > > GENERIC.MP. Userland is from the latest amd snap.
> > >
> > > To reproduce, I'm running a single OpenBSD-current guest under vmd(8)
> > > which I'm targeting with the following trivial btrace script I was
> > > working on to use for debugging something in vmm(4):
> > >
> > > tracepoint:sched:sleep / pid == $1 && tid == $2 /{
> > > printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs);
> > > }
> > >
> > > tracepoint:sched:wakeup / pid == $1 && tid == $2 /{
> > > printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs);
> > > }
> >
> > Even easier reproduction: if you have 2 machines and can use tcpbench(1)
> > between them, then while tcpbench is running target it with the above
> > btrace script. I've found running the script, killing it with ctrl-c,
> > and re-running it 2-3 times triggers the panic on my laptop.
> >
> > >
> > > Both times this happened I was trying to sysupgrade the vmd(8) guest
> > > while running the above btrace script. When I don't run the script,
> > > there is no panic.
> > >
> > > Image of the full backtrace is here: https://imgur.com/a/swW1qoj
> > >
> > > Simple transcript of the call stack after the panic() call looks like:
> > >
> > > wakeup_n
> > > kqueue_wakeup
> > > knote
> > > selwakekup
> > > tun_enqueue
> > > ether_output
> > > ip_output
> > > ip_forward
> > > ip_input_if
> > > ipv4_input
> > > ether_input
> > > if_input_process
> > >
> > > The other 3 cpu cores appeared to be in ipi handlers. (Image in that
> > > imgur link)
>
> I think the problem is recursion within the scheduler. Trace points
> invoke wakeup() directly, which is unsafe when done from within the
> run queue routines.
>
> One approach to avoid the recursion is to defer the wakeup() with
> a soft interrupt. The scheduler runs at an elevated system priority
> level that blocks soft interrupts. The deferred wakeup() is issued once
> the system priority level turns low enough.
>
> Unfortunately, the patch will get broken when someone adds trace points
> to soft interrupt scheduling...
Thanks for the report. Sorry for the delay. I'm now really interested
in fixing this bug because I'm heavily using btrace(8) to analyse the
scheduler and I hit this panic a couple of times.
The problem is that `pnext' might no longer be on the sleepqueue after a
tracepoint inside setrunnable() fired. Diff below fixes that by making
wakeup_n() re-entrant.
I'm not very interested in committing this diff because it relies on a
recursive SCHED_LOCK(). Instead I'd prefer to split wakeup_n() in two
stages: first unlink the threads then call setrunnable(). This approach
will help us untangle the sleepqueue but needs a bit more shuffling,
like moving unsleep() out of setrunnable()...
Claudio, Visa do you agree?
Index: kern/kern_synch.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.200
diff -u -p -r1.200 kern_synch.c
--- kern/kern_synch.c 13 Sep 2023 14:25:49 -0000 1.200
+++ kern/kern_synch.c 20 Feb 2024 19:51:53 -0000
@@ -484,7 +484,7 @@ wakeup_proc(struct proc *p, const volati
unsleep(p);
#ifdef DIAGNOSTIC
else
- panic("wakeup: p_stat is %d", (int)p->p_stat);
+ panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
#endif
}
@@ -532,21 +532,35 @@ void
wakeup_n(const volatile void *ident, int n)
{
struct slpque *qp;
- struct proc *p;
- struct proc *pnext;
+ struct proc *p, piter;
int s;
+ memset(&piter, 0, sizeof(piter));
+ piter.p_stat = SMARKER;
+
SCHED_LOCK(s);
qp = &slpque[LOOKUP(ident)];
- for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) {
- pnext = TAILQ_NEXT(p, p_runq);
+ TAILQ_INSERT_HEAD(qp, &piter, p_runq);
+ while (n != 0) {
+ p = TAILQ_NEXT(&piter, p_runq);
+ if (p == NULL)
+ break;
+
+ /* Move marker forward */
+ TAILQ_REMOVE(qp, &piter, p_runq);
+ TAILQ_INSERT_AFTER(qp, p, &piter, p_runq);
+
+ if (p->p_stat == SMARKER)
+ continue;
+
#ifdef DIAGNOSTIC
if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
- panic("wakeup: p_stat is %d", (int)p->p_stat);
+ panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
#endif
if (wakeup_proc(p, ident, 0))
--n;
}
+ TAILQ_REMOVE(qp, &piter, p_runq);
SCHED_UNLOCK(s);
}
Index: sys/proc.h
===================================================================
RCS file: /cvs/src/sys/sys/proc.h,v
retrieving revision 1.356
diff -u -p -r1.356 proc.h
--- sys/proc.h 3 Feb 2024 18:51:58 -0000 1.356
+++ sys/proc.h 20 Feb 2024 19:54:43 -0000
@@ -414,6 +414,7 @@ struct proc {
#define SZOMB 5 /* unused */
#define SDEAD 6 /* Thread is almost gone */
#define SONPROC 7 /* Thread is currently on a CPU. */
+#define SMARKER 127 /* Used by markers in reentrant
iterator */
#define P_ZOMBIE(p) ((p)->p_stat == SDEAD)
#define P_HASSIBLING(p) ((p)->p_p->ps_threadcnt > 1)