On Tue, May 05, 2026 at 06:22:25PM -0400, Zack Rusin wrote:
> vmw_fence_fifo_down() drops fman->lock to wait on a fence and, on
> timeout, mutates fman->fence_list via list_del_init() and signals
> the fence without re-acquiring the lock.  __vmw_fences_update() walks
> and removes entries from the same list under fman->lock from any
> other waiter, the fence-IRQ thread, or vmw_fences_update(), so the
> unlocked list_del_init() can corrupt the list head.
> 
> Re-take fman->lock before manipulating fence->head and use
> dma_fence_signal_locked().  Wrap the locked signalling in
> dma_fence_begin_signalling() / dma_fence_end_signalling() so the
> lockdep annotation that dma_fence_signal() previously provided is
> preserved (the same pattern as __vmw_fences_update()).
> 
> dma_fence_put() is moved outside the lock to avoid a recursive
> acquire from vmw_fence_obj_destroy(), which also takes fman->lock.
> 

Just looking as someone who is curious about AI - not my driver but this
almost certainly looks like a good fix from quick look at vmwgfx code.

> Fixes: ae2a104058e2 ("vmwgfx: Implement fence objects")
> Cc: [email protected]
> Assisted-by: Claude:claude-opus-4.7
> Signed-off-by: Zack Rusin <[email protected]>
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 4ef84ff9b638..384c6736cf6b 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -367,13 +367,24 @@ void vmw_fence_fifo_down(struct vmw_fence_manager *fman)
>               ret = vmw_fence_obj_wait(fence, false, false,
>                                        VMW_FENCE_WAIT_TIMEOUT);
>  
> +             spin_lock(&fman->lock);
>               if (unlikely(ret != 0)) {
> +                     bool cookie = dma_fence_begin_signalling();
> +
>                       list_del_init(&fence->head);
> -                     dma_fence_signal(&fence->base);
> +                     if (fence->waiter_added) {
> +                             vmw_seqno_waiter_remove(fman->dev_priv);
> +                             fence->waiter_added = false;
> +                     }
> +                     dma_fence_signal_locked(&fence->base);
> +                     dma_fence_end_signalling(cookie);
>               }
>  
>               BUG_ON(!list_empty(&fence->head));
> +             spin_unlock(&fman->lock);
> +

You likely can drop spin_unlock/spin_lock dance around the put here as a
put is just ref count move or vmw_fence_obj_destroy on final which seems
to resolve to kfree_rcu in any case. ofc, if you want to be parnoid it
is perfectly fine to drop the reacquire the lock.

Matt

>               dma_fence_put(&fence->base);
> +
>               spin_lock(&fman->lock);
>       }
>       spin_unlock(&fman->lock);
> -- 
> 2.51.0
> 

Reply via email to