On 24/06/2026 16:42, Timur Kristóf wrote:
On 2026. június 24., szerda 17:14:59 közép-európai nyári idő Tvrtko Ursulin
wrote:
On 24/06/2026 15:52, Timur Kristóf wrote:
On 2026. június 24., szerda 16:31:20 közép-európai nyári idő Tvrtko
Ursulin

wrote:
On 29/05/2026 11:30, Timur Kristóf wrote:
Add a fence callback to the VM update and ACK the retry CAM
after the VM update is finished. Previously, we would ACK it
immediately after calling amdgpu_vm_handle_fault() which
caused a race condition that was likely to trigger the same
interrupt again, causing the same fault to be handled
multiple times.

Signed-off-by: Timur Kristóf <[email protected]>
---

    drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c     | 28
    +++++++++++++++++++--
    drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h     |  8 ++++++
    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  2 +-
    drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  2 +-
    4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index
26aea960e2759..21c8d87477448 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -545,6 +545,16 @@ void amdgpu_gmc_filter_faults_remove(struct
amdgpu_device *adev, uint64_t addr,>

        } while (fault->timestamp < tmp);
}

+static void amdgpu_gmc_retry_fault_handled(struct dma_fence *fence,
+                                          struct dma_fence_cb

*cb)

+{
+       struct amdgpu_fence_cb *afc = container_of(cb, struct

amdgpu_fence_cb,

cb); +  struct amdgpu_device *adev = afc->adev;
+
+       /* CAM index is the array index of the current callback struct */
+       adev->irq.ih_funcs->retry_cam_ack(adev, afc - &adev-

gmc.retry_cb[0]);

Is the "afc - &adev->gmc.retry_cb[0]" part correct? It will be the index
of the array element, while ->retry_cam_ack() expects the content of
that element, no?

Like the comment says, the CAM index is the array index.
We just need the CAM index in order to tell the CAM to ACK the current
entry. The contents of the array are just there to make
dma_fence_add_callback() work with this callback function.

Ah you are right, I got confused. But it is also a bit bad, and I mean
not just the array sizing dilema from lower in the email. But since the
cam_index comes from the hardware and then below we blindly do:

        if (dma_fence_add_callback(fence, &adev-
gmc.retry_cb[cam_index].cb,
amdgpu_gmc_retry_fault_handled))

Should hardware manage to send two faults with the same cam_index when
the previous one hasn't been handled

The retry CAM exists to filter page fault interrupts and prevent sending
multiple interrupts for the same fault. It won't send and interrupt with the
same cam_index until we ACK the previous one.

Okay, but at least kernel needs to be defensive and check to avoid a crash.

that is the very same callback is
already installed and unsignaled (expect the unexpected), we have just
upgraded the hardware bug to a kernel crash.

If I now understand it right, you want to "remember" the cam_index
received so callback knows what to handle. Hmm.. Allocating memory does
seem allowed if I follow correctly that amdgpu_vm_handle_fault() is
calling dma_resv_reserve_fences(). So unless I am missing something
perhaps kmalloc of struct amdgpu_fence_cb would be fine after all?

It may be fine, but I'd very much prefer to avoid it if possible.

One option is to not free the callback struct upon handling it, but stash it somewhere so the following interrupt can simply take it. You can coordinate using cmpxchg for example.

So on the interrupt arriving the flow would be:

if "grab previously stashed unused callback via cmpxchg"
else
        "allocate a new one"

On signalling:

cmpxchg to store the executed callback into the slot and free what was in the slot, if anything.

Slot possibly goes into struct gmc.

If you want to optimise for multiple parallel cam indices you would need multiple slots. Maybe hash the index read from hardware to limit the number of slots placeholds to less than 1024 that you mentioned. Depends how many parallel interrupts you want to handle on the fast path (no allocations).

And free the used slot(s) on driver exit of course.

Regards,

Tvrtko

And
if so you should also probably rename it to a less generic name along
the lines of amgpud_retry_fault_cb or so. Workable?

I'm OK to rename it for sure.

Timur


+}
+

    int amdgpu_gmc_handle_retry_fault(struct amdgpu_device *adev,
struct amdgpu_iv_entry *entry,
                                  u64 addr,

@@ -552,6 +562,7 @@ int amdgpu_gmc_handle_retry_fault(struct
amdgpu_device
*adev,>

                                  u32 node_id,
                                  bool write_fault)
{

+       struct dma_fence *fence = NULL;

        int ret;
        
        if (adev->irq.retry_cam_enabled) {

@@ -564,8 +575,21 @@ int amdgpu_gmc_handle_retry_fault(struct
amdgpu_device *adev,>

                }
                
                ret = amdgpu_vm_handle_fault(adev, entry->pasid,

entry->vmid, node_id,

-                                            addr, entry-

timestamp, write_fault, NULL);

-               adev->irq.ih_funcs->retry_cam_ack(adev, cam_index);
+                                            addr, entry-

timestamp, write_fault, &fence);

+
+               /* If the update is already done, ACK now, otherwise

when it's done. */

+               if (fence) {
+                       adev->gmc.retry_cb[cam_index].adev = adev;

Why is 16 retry_cb elements enough? I see in the code cam_index extraced
from the IV entry with a mask such as 0x3ff.

I think this came up in a conversation after I had already submitted the
patch. The maximum amount of CAM entries are specified by the
IH_RETRY_INT_CAM_CNTL.CAM_SIZE field.  The content of the field will need
to be interpreted as something like this:
((CAM_SIZE + 1) * 64) = (15 + 1) * 64 = 1024

It is a good question whether we actually want to statically allocate that
many items. We should very much avoid doing dynamic allocation in the page
fault handler. I'm open to suggestions on how to move forward with this.

+
+                       if (dma_fence_add_callback(fence, &adev-

gmc.retry_cb[cam_index].cb,

+

amdgpu_gmc_retry_fault_handled))

+                               adev->irq.ih_funcs-

retry_cam_ack(adev, cam_index);

+
+                       dma_fence_put(fence);
+               } else {
+                       adev->irq.ih_funcs->retry_cam_ack(adev,

cam_index);

+               }
+

                if (ret)
                
                        return 1;
        
        } else {

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index
77eb153802845..3bfb06e011a86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -27,6 +27,7 @@

    #define __AMDGPU_GMC_H__
#include <linux/types.h>

+#include <linux/dma-fence.h>

    #include "amdgpu_irq.h"
    #include "amdgpu_xgmi.h"

@@ -214,6 +215,11 @@ struct amdgpu_gmc_memrange {

        int nid_mask;
};

+struct amdgpu_fence_cb {
+       struct amdgpu_device *adev;
+       struct dma_fence_cb cb;
+};
+

    enum amdgpu_gart_placement {
AMDGPU_GART_PLACEMENT_BEST_FIT = 0,
        AMDGPU_GART_PLACEMENT_HIGH,

@@ -305,6 +311,8 @@ struct amdgpu_gmc {

        } fault_hash[AMDGPU_GMC_FAULT_HASH_SIZE];
        uint64_t                last_fault:AMDGPU_GMC_FAULT_RING_ORDER;

+       struct amdgpu_fence_cb retry_cb[16];
+

        bool tmz_enabled;
        bool is_app_apu;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index
8c3ba7213eb22..f5e9b97e92a8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3035,7 +3035,7 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device
*adev, u32 pasid,>

        r = amdgpu_vm_update_pdes(adev, vm, true);

-       *fence = vm->last_update;
+       *fence = dma_fence_get(vm->last_update);

Ah! But passing over since you said you are dropping that patch anyway.

That line should have gone to the previous patch and was added to this one
by mistake.

    error_unlock:
        amdgpu_bo_unreserve(root);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index
2eb64df6daa94..6e28f0e435bf5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -132,7 +132,7 @@ static int amdgpu_vm_sdma_commit(struct
amdgpu_vm_update_params *p,>

                                   DMA_RESV_USAGE_BOOKKEEP);
        
        }

-       if (fence && !p->immediate) {
+       if (fence) {

Is this deliberate and if so what it is about? Commit message should
explain it as well.

The reason it is changed is because previously it wouldn't return a fence
in immediate mode. This line also should have gone to the previous patch
and was added to this one by mistake.

Thanks & best regards,
Timur





Reply via email to