On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> Encountered the following panic:
> 
> panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" failed: 
> file "/usr/src/sys/kern/kern_synch.c", line 373
> Stopped at      db_enter+0x10:  popq    %rbp
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>  423109  57118     55         0x3          0    2  link
>  330695  30276     55         0x3          0    3  link
> * 46366  85501     55      0x1003  0x4080400    1  link
>  188803  85501     55      0x1003  0x4082000    0K link
> db_enter() at db_enter+0x10
> panic(ffffffff81f25d2b) at panic+0xbf
> __assert(ffffffff81f9a186,ffffffff81f372c8,175,ffffffff81f87c6c) at 
> __assert+0x25
> sleep_setup(ffff800022d64bf8,ffff800022d64c98,20,ffffffff81f66ac6,0) at 
> sleep_setup+0x1d8
> cond_wait(ffff800022d64c98,ffffffff81f66ac6) at cond_wait+0x46
> timeout_barrier(ffff8000228a28b0) at timeout_barrier+0x109
> timeout_del_barrier(ffff8000228a28b0) at timeout_del_barrier+0xa2
> sleep_finish(ffff800022d64d90,1) at sleep_finish+0x16d
> tsleep(ffffffff823a5130,120,ffffffff81f0b730,2) at tsleep+0xb2
> sys_nanosleep(ffff8000228a27f0,ffff800022d64ea0,ffff800022d64ef0) at 
> sys_nanosleep+0x12d
> syscall(ffff800022d64f60) at syscall+0x374
> 
> The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> soft-interrupt-driven timeouts could be deleted synchronously without
> blocking. Now, timeout_del_barrier() can sleep regardless of the type
> of the timeout.
> 
> It looks that with small adjustments timeout_del_barrier() can sleep
> in sleep_finish(). The management of run queues is not affected because
> the timeout clearing happens after it. As timeout_del_barrier() does not
> rely on a timeout or signal catching, there should be no risk of
> unbounded recursion or unwanted signal side effects within the sleep
> machinery. In a way, a sleep with a timeout is higher-level than
> one without.

I trust you on the analysis.  However this looks very fragile to me.

The use of timeout_del_barrier() which can sleep using the global sleep
queue is worrying me.  

> Note that endtsleep() can run and set P_TIMEOUT during
> timeout_del_barrier() when the thread is blocked in cond_wait().
> To avoid unnecessary atomic read-modify-write operations, the clearing
> of P_TIMEOUT could be conditional, but maybe that is an unnecessary
> optimization at this point.

I agree this optimization seems unnecessary at the moment.

> While it should be possible to make the code use timeout_del() instead
> of timeout_del_barrier(), the outcome might not be outright better. For
> example, sleep_setup() and endtsleep() would have to coordinate so that
> a late-running timeout from previous sleep cycle would not disturb the
> new cycle.

So that's the price for not having to sleep in sleep_finish(), right?

> To test the barrier path reliably, I made the code call
> timeout_del_barrier() twice in a row. The second call is guaranteed
> to sleep. Of course, this is not part of the patch.

ok mpi@

> Index: kern/kern_synch.c
> ===================================================================
> RCS file: src/sys/kern/kern_synch.c,v
> retrieving revision 1.187
> diff -u -p -r1.187 kern_synch.c
> --- kern/kern_synch.c 13 May 2022 15:32:00 -0000      1.187
> +++ kern/kern_synch.c 5 Jun 2022 05:04:45 -0000
> @@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con
>       p->p_slppri = prio & PRIMASK;
>       TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
>  
> -     KASSERT((p->p_flag & P_TIMEOUT) == 0);
>       if (timo) {
> +             KASSERT((p->p_flag & P_TIMEOUT) == 0);
>               sls->sls_timeout = 1;
>               timeout_add(&p->p_sleep_to, timo);
>       }
> @@ -432,13 +432,12 @@ sleep_finish(struct sleep_state *sls, in
>  
>       if (sls->sls_timeout) {
>               if (p->p_flag & P_TIMEOUT) {
> -                     atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
>                       error1 = EWOULDBLOCK;
>               } else {
> -                     /* This must not sleep. */
> +                     /* This can sleep. It must not use timeouts. */
>                       timeout_del_barrier(&p->p_sleep_to);
> -                     KASSERT((p->p_flag & P_TIMEOUT) == 0);
>               }
> +             atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
>       }
>  
>       /* Check if thread was woken up because of a unwind or signal */
> 

Reply via email to