On Fri, Jun 17, 2022 at 04:25:48PM +0300, Mikhail wrote:
> I was debugging tog in lldb and in second tmux window opened another
> bare tog instance, after a second I got this panic:
> 
> panic: kernel diagnostic assetion "p->p_kq->kq_refcnt.r_refs == 1"
> failed file "/usr/src/sys/kern/kern_event.c", line 839
> 
> There were also couple of xterms and chrome launched.
> 
> There was an update of kern_event.c from 12 Jun - not sure if it's the
> fix for this panic or not.
> 
> After the panic I updated to the latest snapshot, and can't reproduce it
> anymore, but maybe someone will have a clue.

The 12 Jun kern_event.c commit is unrelated.

This report shows no kernel stack trace, so I don't know if the panic
was caused by some unexpected thread exit path.

However, it looks that there is a problem with kqueue_task(). Even
though the task holds a reference to the kqueue, the task should be
cleared before the kqueue is deleted. Otherwise, the kqueue's lifetime
can extend beyond that of the file descriptor table, causing
a use-after-free in KQRELE(). In addition, the task clearing should
avoid the unexpected reference count in kqpoll_exit().


The lifetime bug can be lured out by adding a brief sleep between
taskq_next_work() and (*work.t_func)(work.t_arg) in taskq_thread().
With the sleep in place, regress/sys/kern/kqueue causes the following
panic:

panic: pool_do_get: fdescpl free list modified: page 0xfffffd811cf0e000; item 
addr 0xfffffd811cf0e888; offset 0x48=0xdead4113
Stopped at      db_enter+0x10:  popq    %rbp
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*338246  40644   1001    0x100003          0    3K make
db_enter() at db_enter+0x10
panic(ffffffff81f841ac) at panic+0xbf
pool_do_get(ffffffff823c2fb8,9,ffff8000226dc904) at pool_do_get+0x35c
pool_get(ffffffff823c2fb8,9) at pool_get+0x96
fdcopy(ffff8000ffff13d0) at fdcopy+0x38
process_new(ffff8000ffff9500,ffff8000ffff13d0,1) at process_new+0x107
fork1(ffff8000ffff6d30,1,ffffffff81a6eab0,0,ffff8000226dcb00,0) at fork1+0x236
syscall(ffff8000226dcb70) at syscall+0x374
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7fffff28e0, count: 6


The following patch replaces the task_del() with taskq_del_barrier()
but also makes the barrier conditional to prior task usage. This avoids
the barrier invocation in the typical case where there is no kqueue
nesting (or poll(2)'ing or select(2)'ing of kqueues).

The new barrier adds a lock order constraint. The locks that the thread
can hold when calling kqueue_terminate() should not be taken by tasks
that are run by systqmp. If this becomes a problem in the future,
kqueue can have its own taskq.


Index: kern/kern_event.c
===================================================================
RCS file: src/sys/kern/kern_event.c,v
retrieving revision 1.189
diff -u -p -r1.189 kern_event.c
--- kern/kern_event.c   12 Jun 2022 10:34:36 -0000      1.189
+++ kern/kern_event.c   19 Jun 2022 10:38:45 -0000
@@ -1581,6 +1581,7 @@ void
 kqueue_terminate(struct proc *p, struct kqueue *kq)
 {
        struct knote *kn;
+       int state;
 
        mtx_enter(&kq->kq_lock);
 
@@ -1593,11 +1594,17 @@ kqueue_terminate(struct proc *p, struct 
                KASSERT(kn->kn_filter == EVFILT_MARKER);
 
        kq->kq_state |= KQ_DYING;
+       state = kq->kq_state;
        kqueue_wakeup(kq);
        mtx_leave(&kq->kq_lock);
 
+       /*
+        * Any knotes that were attached to this kqueue were deleted
+        * by knote_fdclose() when this kqueue's file descriptor was closed.
+        */
        KASSERT(klist_empty(&kq->kq_sel.si_note));
-       task_del(systqmp, &kq->kq_task);
+       if (state & KQ_TASK)
+               taskq_del_barrier(systqmp, &kq->kq_task);
 }
 
 int
@@ -1623,7 +1630,6 @@ kqueue_task(void *arg)
        mtx_enter(&kqueue_klist_lock);
        KNOTE(&kq->kq_sel.si_note, 0);
        mtx_leave(&kqueue_klist_lock);
-       KQRELE(kq);
 }
 
 void
@@ -1637,9 +1643,8 @@ kqueue_wakeup(struct kqueue *kq)
        }
        if (!klist_empty(&kq->kq_sel.si_note)) {
                /* Defer activation to avoid recursion. */
-               KQREF(kq);
-               if (!task_add(systqmp, &kq->kq_task))
-                       KQRELE(kq);
+               kq->kq_state |= KQ_TASK;
+               task_add(systqmp, &kq->kq_task);
        }
 }
 
Index: sys/eventvar.h
===================================================================
RCS file: src/sys/sys/eventvar.h,v
retrieving revision 1.14
diff -u -p -r1.14 eventvar.h
--- sys/eventvar.h      16 Mar 2022 14:38:43 -0000      1.14
+++ sys/eventvar.h      19 Jun 2022 10:38:45 -0000
@@ -68,6 +68,7 @@ struct kqueue {
 #define KQ_SEL         0x01
 #define KQ_SLEEP       0x02
 #define KQ_DYING       0x04
+#define KQ_TASK                0x08
 };
 
 #endif /* !_SYS_EVENTVAR_H_ */

Reply via email to