On Sun, Jun 05, 2022 at 12:27:32PM +0200, Martin Pieuchot wrote:
> On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> > Encountered the following panic:
> > 
> > panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" failed: 
> > file "/usr/src/sys/kern/kern_synch.c", line 373
> > Stopped at      db_enter+0x10:  popq    %rbp
> >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> >  423109  57118     55         0x3          0    2  link
> >  330695  30276     55         0x3          0    3  link
> > * 46366  85501     55      0x1003  0x4080400    1  link
> >  188803  85501     55      0x1003  0x4082000    0K link
> > db_enter() at db_enter+0x10
> > panic(ffffffff81f25d2b) at panic+0xbf
> > __assert(ffffffff81f9a186,ffffffff81f372c8,175,ffffffff81f87c6c) at 
> > __assert+0x25
> > sleep_setup(ffff800022d64bf8,ffff800022d64c98,20,ffffffff81f66ac6,0) at 
> > sleep_setup+0x1d8
> > cond_wait(ffff800022d64c98,ffffffff81f66ac6) at cond_wait+0x46
> > timeout_barrier(ffff8000228a28b0) at timeout_barrier+0x109
> > timeout_del_barrier(ffff8000228a28b0) at timeout_del_barrier+0xa2
> > sleep_finish(ffff800022d64d90,1) at sleep_finish+0x16d
> > tsleep(ffffffff823a5130,120,ffffffff81f0b730,2) at tsleep+0xb2
> > sys_nanosleep(ffff8000228a27f0,ffff800022d64ea0,ffff800022d64ef0) at 
> > sys_nanosleep+0x12d
> > syscall(ffff800022d64f60) at syscall+0x374
> > 
> > The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> > soft-interrupt-driven timeouts could be deleted synchronously without
> > blocking. Now, timeout_del_barrier() can sleep regardless of the type
> > of the timeout.
> > 
> > It looks that with small adjustments timeout_del_barrier() can sleep
> > in sleep_finish(). The management of run queues is not affected because
> > the timeout clearing happens after it. As timeout_del_barrier() does not
> > rely on a timeout or signal catching, there should be no risk of
> > unbounded recursion or unwanted signal side effects within the sleep
> > machinery. In a way, a sleep with a timeout is higher-level than
> > one without.
> 
> I trust you on the analysis.  However this looks very fragile to me.
> 
> The use of timeout_del_barrier() which can sleep using the global sleep
> queue is worrying me.  

I think the queue handling ends in sleep_finish() when SCHED_LOCK()
is released. The timeout clearing is done outside of it.

The extra sleeping point inside sleep_finish() is subtle. It should not
matter in typical use. But is it permissible with the API? Also, if
timeout_del_barrier() sleeps, the thread's priority can change.

Note that sleep_finish() already can take an additional nap when
signal catching is enabled.

> > Note that endtsleep() can run and set P_TIMEOUT during
> > timeout_del_barrier() when the thread is blocked in cond_wait().
> > To avoid unnecessary atomic read-modify-write operations, the clearing
> > of P_TIMEOUT could be conditional, but maybe that is an unnecessary
> > optimization at this point.
> 
> I agree this optimization seems unnecessary at the moment.
> 
> > While it should be possible to make the code use timeout_del() instead
> > of timeout_del_barrier(), the outcome might not be outright better. For
> > example, sleep_setup() and endtsleep() would have to coordinate so that
> > a late-running timeout from previous sleep cycle would not disturb the
> > new cycle.
> 
> So that's the price for not having to sleep in sleep_finish(), right?

That is correct. Some synchronization is needed in any case.

> > To test the barrier path reliably, I made the code call
> > timeout_del_barrier() twice in a row. The second call is guaranteed
> > to sleep. Of course, this is not part of the patch.
> 
> ok mpi@
> 
> > Index: kern/kern_synch.c
> > ===================================================================
> > RCS file: src/sys/kern/kern_synch.c,v
> > retrieving revision 1.187
> > diff -u -p -r1.187 kern_synch.c
> > --- kern/kern_synch.c       13 May 2022 15:32:00 -0000      1.187
> > +++ kern/kern_synch.c       5 Jun 2022 05:04:45 -0000
> > @@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con
> >     p->p_slppri = prio & PRIMASK;
> >     TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
> >  
> > -   KASSERT((p->p_flag & P_TIMEOUT) == 0);
> >     if (timo) {
> > +           KASSERT((p->p_flag & P_TIMEOUT) == 0);
> >             sls->sls_timeout = 1;
> >             timeout_add(&p->p_sleep_to, timo);
> >     }
> > @@ -432,13 +432,12 @@ sleep_finish(struct sleep_state *sls, in
> >  
> >     if (sls->sls_timeout) {
> >             if (p->p_flag & P_TIMEOUT) {
> > -                   atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
> >                     error1 = EWOULDBLOCK;
> >             } else {
> > -                   /* This must not sleep. */
> > +                   /* This can sleep. It must not use timeouts. */
> >                     timeout_del_barrier(&p->p_sleep_to);
> > -                   KASSERT((p->p_flag & P_TIMEOUT) == 0);
> >             }
> > +           atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
> >     }
> >  
> >     /* Check if thread was woken up because of a unwind or signal */
> > 
> 

Reply via email to