On 2/10/26 10:14, Pierre-Eric Pelloux-Prayer wrote:
> Invalidating a dmabuf will impact other users of the shared BO.
> In the scenario where process A moves the BO, it needs to inform
> process B about the move and process B will need to update its
> page table.
>
> The commit fixes a synchronisation bug caused by the use of the
> ticket: it made amdgpu_vm_handle_moved behave as if updating
> the page table immediately was correct but in this case it's not.
>
> An example is the following scenario, with 2 GPUs and glxgears
> running on GPU0 and Xorg running on GPU1, on a system where P2P
> PCI isn't supported:
>
> glxgears:
> export linear buffer from GPU0 and import using GPU1
> submit frame rendering to GPU0
> submit tiled->linear blit
> Xorg:
> copy of linear buffer
>
> The sequence of jobs would be:
> drm_sched_job_run # GPU0, frame rendering
> drm_sched_job_queue # GPU0, blit
> drm_sched_job_done # GPU0, frame rendering
> drm_sched_job_run # GPU0, blit
> move linear buffer for GPU1 access #
> amdgpu_dma_buf_move_notify -> update pt # GPU0
>
> It this point the blit job on GPU0 is still running and would
> likely produce a page fault.
>
> Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2")
CC: stable?
> Signed-off-by: Pierre-Eric Pelloux-Prayer <[email protected]>
Reviewed-by: Christian König <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index b9c38a4fe546..656c267dbe58 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -514,8 +514,15 @@ amdgpu_dma_buf_move_notify(struct dma_buf_attachment
> *attach)
> r = dma_resv_reserve_fences(resv, 2);
> if (!r)
> r = amdgpu_vm_clear_freed(adev, vm, NULL);
> +
> + /* Don't pass 'ticket' to amdgpu_vm_handle_moved: we want the
> clear=true
> + * path to be used otherwise we might update the PT of another
> process
> + * while it's using the BO.
> + * With clear=true, amdgpu_vm_bo_update will sync to command
> submission
> + * from the same VM.
> + */
> if (!r)
> - r = amdgpu_vm_handle_moved(adev, vm, ticket);
> + r = amdgpu_vm_handle_moved(adev, vm, NULL);
>
> if (r && r != -EBUSY)
> DRM_ERROR("Failed to invalidate VM page tables (%d))\n",