On Tue, May 05, 2026 at 06:22:25PM -0400, Zack Rusin wrote:
> vmw_fence_fifo_down() drops fman->lock to wait on a fence and, on
> timeout, mutates fman->fence_list via list_del_init() and signals
> the fence without re-acquiring the lock. __vmw_fences_update() walks
> and removes entries from the same list under fman->lock from any
> other waiter, the fence-IRQ thread, or vmw_fences_update(), so the
> unlocked list_del_init() can corrupt the list head.
>
> Re-take fman->lock before manipulating fence->head and use
> dma_fence_signal_locked(). Wrap the locked signalling in
> dma_fence_begin_signalling() / dma_fence_end_signalling() so the
> lockdep annotation that dma_fence_signal() previously provided is
> preserved (the same pattern as __vmw_fences_update()).
>
> dma_fence_put() is moved outside the lock to avoid a recursive
> acquire from vmw_fence_obj_destroy(), which also takes fman->lock.
>
Just looking as someone who is curious about AI - not my driver but this
almost certainly looks like a good fix from quick look at vmwgfx code.
> Fixes: ae2a104058e2 ("vmwgfx: Implement fence objects")
> Cc: [email protected]
> Assisted-by: Claude:claude-opus-4.7
> Signed-off-by: Zack Rusin <[email protected]>
> ---
> drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 4ef84ff9b638..384c6736cf6b 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -367,13 +367,24 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
> ret = vmw_fence_obj_wait(fence, false, false,
> VMW_FENCE_WAIT_TIMEOUT);
>
> + spin_lock(&fman->lock);
> if (unlikely(ret != 0)) {
> + bool cookie = dma_fence_begin_signalling();
> +
> list_del_init(&fence->head);
> - dma_fence_signal(&fence->base);
> + if (fence->waiter_added) {
> + vmw_seqno_waiter_remove(fman->dev_priv);
> + fence->waiter_added = false;
> + }
> + dma_fence_signal_locked(&fence->base);
> + dma_fence_end_signalling(cookie);
> }
>
> BUG_ON(!list_empty(&fence->head));
> + spin_unlock(&fman->lock);
> +
You likely can drop spin_unlock/spin_lock dance around the put here as a
put is just ref count move or vmw_fence_obj_destroy on final which seems
to resolve to kfree_rcu in any case. ofc, if you want to be parnoid it
is perfectly fine to drop the reacquire the lock.
Matt
> dma_fence_put(&fence->base);
> +
> spin_lock(&fman->lock);
> }
> spin_unlock(&fman->lock);
> --
> 2.51.0
>