On 1/6/26 1:57 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 06, 2026 at 09:46:30AM +0800, Hillf Danton wrote:
>>> taking vq mutex in a kill handler is probably not wise.
>>> we should have a separate lock just for handling worker
>>> assignment.
>>>
>> Better not before showing us the root cause of the hung to
>> avoid adding a blind lock.
>
> Well I think it's pretty clear but the issue is that just another lock
> is not enough, we have bigger problems with this mutex.
>
> It's held around userspace accesses so if the vhost thread gets into
> uninterruptible sleep holding that, a userspace thread trying to take it
> with mutex_lock will be uninterruptible.
>
> So it propagates the uninterruptible status between vhost and a
> userspace thread.
>
> It's not a new issue but the new(ish) thread management APIs make
> it more visible.
>
> Here it's the kill handler that got hung but it's not really limited
> to that, any ioctl can do that, and I do not want to add another
> lock on data path.
>
Above, are you saying that the kill handler and a ioctl are trying
to take the virtqueue->mutex in this bug?
I've been trying to replicate this for a while, but I just can't hit what
I'm seeing in the lockdep info from the initial email. We only see the
kill handler trying to take the virtqueue->mutex. Is the theory that the
locking info being reported is not correct? A userspace thread is doing
an ioctl that took the mutex but it's not reported below?
Originally I was using the vhost_dev->mutex for the locking in
vhost_worker_killed
but I saw we could take that during ioctls which did a flush, so I added the
vhost_worker->mutex for some of the locking.
If the virtqueue->mutex is also an issue I can do a patch.
Showing all locks held in the system:
1 lock held by khungtaskd/32:
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire
include/linux/rcupdate.h:331 [inline]
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock
include/linux/rcupdate.h:867 [inline]
#0: ffffffff8df41aa0 (rcu_read_lock){....}-{1:3}, at:
debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6775
2 locks held by getty/5579:
#0: ffff88814e3cb0a0 (&tty->ldisc_sem){++++}-{0:0}, at:
tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc9000332b2f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at:
n_tty_read+0x449/0x1460 drivers/tty/n_tty.c:2211
1 lock held by syz-executor/5978:
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock
kernel/rcu/tree_exp.h:311 [inline]
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at:
synchronize_rcu_expedited+0x2b1/0x6e0 kernel/rcu/tree_exp.h:956
2 locks held by syz.5.259/7601:
3 locks held by vhost-7617/7618:
#0: ffff888054cc68e8 (&vtsk->exit_mutex){+.+.}-{4:4}, at:
vhost_task_fn+0x322/0x430 kernel/vhost_task.c:54
#1: ffff888024646a80 (&worker->mutex){+.+.}-{4:4}, at:
vhost_worker_killed+0x57/0x390 drivers/vhost/vhost.c:470
#2: ffff8880550c0258 (&vq->mutex){+.+.}-{4:4}, at:
vhost_worker_killed+0x12b/0x390 drivers/vhost/vhost.c:476
1 lock held by syz-executor/7850:
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at: exp_funnel_lock
kernel/rcu/tree_exp.h:343 [inline]
#0: ffffffff8df475f8 (rcu_state.exp_mutex){+.+.}-{4:4}, at:
synchronize_rcu_expedited+0x36e/0x6e0 kernel/rcu/tree_exp.h:956
1 lock held by syz.2.640/9940:
4 locks held by syz.3.641/9946:
3 locks held by syz.1.642/9954: