On Thu, Feb 22, 2024 at 04:16:57PM +0100, Martin Pieuchot wrote:
> On 21/02/24(Wed) 13:05, Claudio Jeker wrote:
> > On Tue, Feb 20, 2024 at 09:34:12PM +0100, Martin Pieuchot wrote:
> > > On 28/10/21(Thu) 05:45, Visa Hankala wrote:
> > > > On Wed, Oct 27, 2021 at 09:02:08PM -0400, Dave Voutila wrote:
> > > > > Dave Voutila <[email protected]> writes:
> > > > > 
> > > > > > Was tinkering on a bt(5) script for trying to debug an issue in 
> > > > > > vmm(4)
> > > > > > when I managed to start hitting a panic "wakeup: p_stat is 2" being
> > > > > > triggered by kqueue coming from the softnet kernel task.
> > > > > >
> > > > > > I'm running an amd64 kernel built from the tree today (latest CVS 
> > > > > > commit
> > > > > > id UynQo1r7kLKA0Q2p) with VMM_DEBUG option set and the defaults from
> > > > > > GENERIC.MP. Userland is from the latest amd snap.
> > > > > >
> > > > > > To reproduce, I'm running a single OpenBSD-current guest under 
> > > > > > vmd(8)
> > > > > > which I'm targeting with the following trivial btrace script I was
> > > > > > working on to use for debugging something in vmm(4):
> > > > > >
> > > > > > tracepoint:sched:sleep / pid == $1 && tid == $2 /{
> > > > > >   printf("pid %d, tid %d slept %d!\n", pid, tid, nsecs);
> > > > > > }
> > > > > >
> > > > > > tracepoint:sched:wakeup / pid == $1 && tid == $2 /{
> > > > > >   printf("pid %d, tid %d awoke %d!\n", pid, tid, nsecs);
> > > > > > }
> > > > > 
> > > > > Even easier reproduction: if you have 2 machines and can use 
> > > > > tcpbench(1)
> > > > > between them, then while tcpbench is running target it with the above
> > > > > btrace script. I've found running the script, killing it with ctrl-c,
> > > > > and re-running it 2-3 times triggers the panic on my laptop.
> > > > > 
> > > > > >
> > > > > > Both times this happened I was trying to sysupgrade the vmd(8) guest
> > > > > > while running the above btrace script. When I don't run the script,
> > > > > > there is no panic.
> > > > > >
> > > > > > Image of the full backtrace is here: https://imgur.com/a/swW1qoj
> > > > > >
> > > > > > Simple transcript of the call stack after the panic() call looks 
> > > > > > like:
> > > > > >
> > > > > > wakeup_n
> > > > > > kqueue_wakeup
> > > > > > knote
> > > > > > selwakekup
> > > > > > tun_enqueue
> > > > > > ether_output
> > > > > > ip_output
> > > > > > ip_forward
> > > > > > ip_input_if
> > > > > > ipv4_input
> > > > > > ether_input
> > > > > > if_input_process
> > > > > >
> > > > > > The other 3 cpu cores appeared to be in ipi handlers. (Image in that
> > > > > > imgur link)
> > > > 
> > > > I think the problem is recursion within the scheduler. Trace points
> > > > invoke wakeup() directly, which is unsafe when done from within the
> > > > run queue routines.
> > > > 
> > > > One approach to avoid the recursion is to defer the wakeup() with
> > > > a soft interrupt. The scheduler runs at an elevated system priority
> > > > level that blocks soft interrupts. The deferred wakeup() is issued once
> > > > the system priority level turns low enough.
> > > > 
> > > > Unfortunately, the patch will get broken when someone adds trace points
> > > > to soft interrupt scheduling...
> > > 
> > > Thanks for the report.  Sorry for the delay.  I'm now really interested
> > > in fixing this bug because I'm heavily using btrace(8) to analyse the
> > > scheduler and I hit this panic a couple of times.
> > > 
> > > The problem is that `pnext' might no longer be on the sleepqueue after a
> > > tracepoint inside setrunnable() fired.  Diff below fixes that by making
> > > wakeup_n() re-entrant.
> > > 
> > > I'm not very interested in committing this diff because it relies on a
> > > recursive SCHED_LOCK().  Instead I'd prefer to split wakeup_n() in two
> > > stages: first unlink the threads then call setrunnable().  This approach
> > > will help us untangle the sleepqueue but needs a bit more shuffling,
> > > like moving unsleep() out of setrunnable()...
> > > 
> > > Claudio, Visa do you agree?
> >
> > [...] 
> >
> > Ugh, this is too much magic for my little brain.
> > I would prefer to unqueue the procs onto a new local list and have a
> > special wakeup_proc for them. The code around wakeup and unsleep will need
> > some changes to unlock it properly.
> 
> Yes, that is what I had in mind.  Diff below does that and works for
> me(tm).
> 
> The first loop removes threads from the sleep queue and the second put
> them on a runqueue and optionally call tracepoints.
> 
> Note that unsleep() has been removed from setrunnable() to reduce the
> complexity of the code.  Messing, sometimes, with the state of the
> sleepqueue down in this path doesn't help me get a simpler picture of
> this machinery.
> 
> ok?

I like this a lot. I think this is going in the right direction.
Diff looks good but I only quickly looked at it.
Untangling unsleep() from setrunnable() is a good thing (I hope it works).

> Index: dev/pci/drm/drm_linux.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v
> retrieving revision 1.109
> diff -u -p -r1.109 drm_linux.c
> --- dev/pci/drm/drm_linux.c   21 Jan 2024 13:36:40 -0000      1.109
> +++ dev/pci/drm/drm_linux.c   22 Feb 2024 14:32:27 -0000
> @@ -162,7 +162,7 @@ wake_up_process(struct proc *p)
>       int s, rv;
>  
>       SCHED_LOCK(s);
> -     rv = wakeup_proc(p, NULL, 0);
> +     rv = wakeup_proc(p, 0);
>       SCHED_UNLOCK(s);
>       return rv;
>  }
> Index: kern/kern_sig.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_sig.c,v
> retrieving revision 1.321
> diff -u -p -r1.321 kern_sig.c
> --- kern/kern_sig.c   17 Jan 2024 22:22:25 -0000      1.321
> +++ kern/kern_sig.c   22 Feb 2024 14:25:16 -0000
> @@ -1200,6 +1200,7 @@ runfast:
>       if (p->p_usrpri > PUSER)
>               p->p_usrpri = PUSER;
>  run:
> +     unsleep(p);
>       setrunnable(p);
>  out:
>       /* finally adjust siglist */
> @@ -2104,6 +2105,7 @@ single_thread_set(struct proc *p, int fl
>               if (q->p_flag & P_WEXIT) {
>                       if (mode == SINGLE_EXIT) {
>                               if (q->p_stat == SSTOP) {
> +                                     unsleep(q);
>                                       setrunnable(q);
>                                       atomic_inc_int(&pr->ps_singlecount);
>                               }
> @@ -2125,12 +2127,14 @@ single_thread_set(struct proc *p, int fl
>                                       break;
>                               }
>                               /* need to unwind or exit, so wake it */
> +                             unsleep(q);
>                               setrunnable(q);
>                       }
>                       atomic_inc_int(&pr->ps_singlecount);
>                       break;
>               case SSTOP:
>                       if (mode == SINGLE_EXIT) {
> +                             unsleep(q);
>                               setrunnable(q);
>                               atomic_inc_int(&pr->ps_singlecount);
>                       }
> @@ -2198,9 +2202,10 @@ single_thread_clear(struct proc *p, int 
>                * it back into some sleep queue
>                */
>               if (q->p_stat == SSTOP && (q->p_flag & flag) == 0) {
> -                     if (q->p_wchan == NULL)
> +                     if (q->p_wchan == NULL) {
> +                             unsleep(q);
>                               setrunnable(q);
> -                     else {
> +                     } else {
>                               atomic_clearbits_int(&q->p_flag, P_WSLEEP);
>                               q->p_stat = SSLEEP;
>                       }
> Index: kern/kern_synch.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.200
> diff -u -p -r1.200 kern_synch.c
> --- kern/kern_synch.c 13 Sep 2023 14:25:49 -0000      1.200
> +++ kern/kern_synch.c 22 Feb 2024 14:55:05 -0000
> @@ -467,25 +467,23 @@ sleep_signal_check(void)
>  }
>  
>  int
> -wakeup_proc(struct proc *p, const volatile void *chan, int flags)
> +wakeup_proc(struct proc *p, int flags)
>  {
>       int awakened = 0;
>  
>       SCHED_ASSERT_LOCKED();
>  
> -     if (p->p_wchan != NULL &&
> -        ((chan == NULL) || (p->p_wchan == chan))) {
> +     if (p->p_wchan != NULL) {
>               awakened = 1;
>               if (flags)
>                       atomic_setbits_int(&p->p_flag, flags);
> -             if (p->p_stat == SSLEEP)
> -                     setrunnable(p);
> -             else if (p->p_stat == SSTOP)
> -                     unsleep(p);
>  #ifdef DIAGNOSTIC
> -             else
> -                     panic("wakeup: p_stat is %d", (int)p->p_stat);
> +             if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
> +                     panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
>  #endif
> +             unsleep(p);
> +             if (p->p_stat == SSLEEP)
> +                     setrunnable(p);
>       }
>  
>       return awakened;
> @@ -505,7 +503,7 @@ endtsleep(void *arg)
>       int s;
>  
>       SCHED_LOCK(s);
> -     wakeup_proc(p, NULL, P_TIMEOUT);
> +     wakeup_proc(p, P_TIMEOUT);
>       SCHED_UNLOCK(s);
>  }
>  
> @@ -531,21 +529,35 @@ unsleep(struct proc *p)
>  void
>  wakeup_n(const volatile void *ident, int n)
>  {
> -     struct slpque *qp;
> +     struct slpque *qp, wakeq;
>       struct proc *p;
>       struct proc *pnext;
>       int s;
>  
> +     TAILQ_INIT(&wakeq);
> +
>       SCHED_LOCK(s);
>       qp = &slpque[LOOKUP(ident)];
>       for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) {
>               pnext = TAILQ_NEXT(p, p_runq);
>  #ifdef DIAGNOSTIC
>               if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
> -                     panic("wakeup: p_stat is %d", (int)p->p_stat);
> +                     panic("thread %d p_stat is %d", p->p_tid, p->p_stat);
>  #endif
> -             if (wakeup_proc(p, ident, 0))
> +             KASSERT(p->p_wchan != NULL);
> +             if (p->p_wchan == ident) {
> +                     TAILQ_REMOVE(qp, p, p_runq);
> +                     p->p_wchan = NULL;
> +                     TAILQ_INSERT_TAIL(&wakeq, p, p_runq);
>                       --n;
> +             }
> +     }
> +     while ((p = TAILQ_FIRST(&wakeq))) {
> +             TAILQ_REMOVE(&wakeq, p, p_runq);
> +             TRACEPOINT(sched, unsleep, p->p_tid + THREAD_PID_OFFSET,
> +                 p->p_p->ps_pid);
> +             if (p->p_stat == SSLEEP)
> +                     setrunnable(p);
>       }
>       SCHED_UNLOCK(s);
>  }
> Index: kern/sched_bsd.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> retrieving revision 1.90
> diff -u -p -r1.90 sched_bsd.c
> --- kern/sched_bsd.c  24 Jan 2024 19:23:38 -0000      1.90
> +++ kern/sched_bsd.c  22 Feb 2024 14:24:31 -0000
> @@ -500,12 +500,10 @@ setrunnable(struct proc *p)
>               if ((pr->ps_flags & PS_TRACED) != 0 && pr->ps_xsig != 0)
>                       atomic_setbits_int(&p->p_siglist, sigmask(pr->ps_xsig));
>               prio = p->p_usrpri;
> -             unsleep(p);
>               setrunqueue(NULL, p, prio);
>               break;
>       case SSLEEP:
>               prio = p->p_slppri;
> -             unsleep(p);             /* e.g. when sending signals */
>  
>               /* if not yet asleep, don't add to runqueue */
>               if (ISSET(p->p_flag, P_WSLEEP))
> Index: kern/sys_process.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/sys_process.c,v
> retrieving revision 1.95
> diff -u -p -r1.95 sys_process.c
> --- kern/sys_process.c        21 Nov 2023 14:00:13 -0000      1.95
> +++ kern/sys_process.c        22 Feb 2024 14:25:34 -0000
> @@ -493,6 +493,7 @@ ptrace_ctrl(struct proc *p, int req, pid
>               if (t->p_stat == SSTOP) {
>                       tr->ps_xsig = data;
>                       SCHED_LOCK(s);
> +                     unsleep(t);
>                       setrunnable(t);
>                       SCHED_UNLOCK(s);
>               } else {
> Index: sys/proc.h
> ===================================================================
> RCS file: /cvs/src/sys/sys/proc.h,v
> retrieving revision 1.356
> diff -u -p -r1.356 proc.h
> --- sys/proc.h        3 Feb 2024 18:51:58 -0000       1.356
> +++ sys/proc.h        22 Feb 2024 14:32:13 -0000
> @@ -555,7 +555,7 @@ void      procinit(void);
>  void setpriority(struct proc *, uint32_t, uint8_t);
>  void setrunnable(struct proc *);
>  void endtsleep(void *);
> -int  wakeup_proc(struct proc *, const volatile void *, int);
> +int  wakeup_proc(struct proc *, int);
>  void unsleep(struct proc *);
>  void reaper(void *);
>  __dead void exit1(struct proc *, int, int, int);

-- 
:wq Claudio

Reply via email to