On Thu, Nov 6, 2025 at 12:21 PM <[email protected]> wrote:
>
> From: Vitaly Prosyak <[email protected]>
>
> Certain multi-GPU configurations (especially GFX12) may hit
> data corruption when a DCC-compressed VRAM surface is shared across GPUs
> using peer-to-peer (P2P) DMA transfers.
>
> Such surfaces rely on device-local metadata and cannot be safely accessed
> through a remote GPU’s page tables. Attempting to import a DCC-enabled
> surface through P2P leads to incorrect rendering or GPU faults.
>
> This change disables P2P for DCC-enabled VRAM buffers that are contiguous
> and allocated on GFX12+ hardware.  In these cases, the importer falls back
> to the standard system-memory path, avoiding invalid access to compressed
> surfaces.
>
> Future work could consider optional migration (VRAM→System→VRAM) if a
> performance regression is observed when `attach->peer2peer = false`.
>
> Tested on:
>  - Dual RX 9700 XT (Navi4x) setup
>  - GNOME and Wayland compositor scenarios
>  - Confirmed no corruption after disabling P2P under these conditions
>
> Suggested-by: Christian König <[email protected]>
> Signed-off-by: Vitaly Prosyak <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index 9a0bce3ba24c..d2d31031f672 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -260,11 +260,24 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>         struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>
>  #ifdef HAVE_STRUCT_DMA_BUF_ATTACH_OPS_ALLOW_PEER2PEER

This patch is against the DKMS tree, for upstream, please rebase
before you commit.  With that fixed:
Reviewed-by: Alex Deucher <[email protected]>


> +       /*
> +        * Disable peer-to-peer access for DCC-enabled VRAM surfaces on 
> GFX12+.
> +        * Such buffers cannot be safely accessed over P2P due to device-local
> +        * compression metadata. Fallback to system-memory path instead.
> +        * Device supports GFX12 (GC 12.x or newer)
> +        * BO was created with the AMDGPU_GEM_CREATE_GFX12_DCC flag
> +        *
> +        */
> +       if ((adev->ip_versions[GC_HWIP][0] >= IP_VERSION(12, 0, 0)) &&
> +               bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC) {
> +               attach->peer2peer = false;
> +               goto update_vm;
> +       }
>         if (!amdgpu_dmabuf_is_xgmi_accessible(attach_adev, bo) &&
>             pci_p2pdma_distance(adev->pdev, attach->dev, false) < 0)
>                 attach->peer2peer = false;
>  #endif
> -
> +update_vm:
>         amdgpu_vm_bo_update_shared(bo);
>
>         return 0;
> --
> 2.51.2
>

Reply via email to