On Tue, Jan 07, 2025 at 03:58:46PM +1000, Dave Airlie wrote:
> From: Dave Airlie <airl...@redhat.com>
> 
> If we have two nouveau controlled devices and one passes a dma-fence
> to the other, when we hit the sync path it can cause the second device
> to try and put a sync wait in it's pushbuf for the seqno of the context
> on the first device.
> 
> Since fence contexts are vmm bound, check the if vmm's match between
> both users, this should ensure that fence seqnos don't get used wrongly
> on incorrect channels.

The fence sequence number is global, i.e. per device, hence checking the vmm
context seems too restrictive.

Wouldn't it be better to ensure that `prev->cli->drm == chan->cli->drm`?

This way we can still optimize where dependencies are between different
applications, but on the same device.

> 
> This seems to happen fairly spuriously and I found it tracking down
> a multi-card regression report, that seems to work by luck before this.
> 
> Signed-off-by: Dave Airlie <airl...@redhat.com>
> Cc: sta...@vger.kernel.org
> ---
>  drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
> b/drivers/gpu/drm/nouveau/nouveau_fence.c
> index ee5e9d40c166f..5743c82f4094b 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fence.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
> @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct 
> nouveau_channel *chan,
>  
>                               rcu_read_lock();
>                               prev = rcu_dereference(f->channel);
> -                             if (prev && (prev == chan ||
> +                             if (prev && (prev->vmm == chan->vmm) &&
> +                                 (prev == chan ||

Maybe better break it down a bit, e.g.

bool local = prev && (prev->... == chan->...);

if (local && ...) {
...
}

>                                            fctx->sync(f, prev, chan) == 0))
>                                       must_wait = false;
>                               rcu_read_unlock();
> -- 
> 2.43.0
> 

Reply via email to