gmc: Don't compare page fault timestamps with other interrupts

Tvrtko Ursulin Mon, 15 Jun 2026 08:24:02 -0700


On 15/06/2026 15:52, Timur Kristóf wrote:

On Monday, June 15, 2026 4:32:23 PM Central European Summer Time Tvrtko
Ursulin wrote:

On 25/05/2026 12:45, Timur Kristóf wrote:

Different interrupts may have different timestamp sources,
which shouldn't be compared.

If we compare the timestamps of retry faults to timestamps
of other interrupts, it may result in all retry fault
interrupts being filtered out, because of the different
time stamp source.

This issue was observed on Strix Halo.
Solved by storing the timestamp of the last page fault interrupt.

Hi,

This one may require access to AMD docs to review. For example I am
immediately curious as to how many different clock sources on a single
IH there are


As far as I know there are various timestamp sources in the GPU and some
interrupts use different ones. I am not aware of any documentation on this
topic, unfortunately.

how does that relate to the timestamp_src field


The timestamp_src field is set differently when the timestamp source is
different. So, it could happen that we accidentally filter out all page faults
when we shouldn't.

and if there are indeed multiple clock domains should the patch perhaps be
generalized to something like
ih->processed_timestamp[entry->timestamp_src] or something?


For the context of this patch, I think it doesn't matter how many different
kinds of time stamps there are. What's important is that we just shouldn't
compare timestamps of page faults with time stamps of other interrupts.


True, thank you!

Another question is why the backward timestamp check is needed only forfault interrupts? I do not see it elsewhere.


Let me also ask two more things below.

As far as I see the timestamp doesn't really matter for other interrupts as we
only use it to filter out page faults and nothing else.

Hope this helps,
Timur

---

   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 ++++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
   2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index
13bec8461cde..52258f1341c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -437,9 +437,12 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device
*adev,>
        uint32_t hash;
        
        /* Stale retry fault if timestamp goes backward */

-       if (amdgpu_ih_ts_after(timestamp, ih->processed_timestamp))
+       if (timestamp == adev->gmc.processed_fault_timestamp ||
+               amdgpu_ih_ts_after(timestamp, adev-

gmc.processed_fault_timestamp))

First thing is whether you are confident the equality check is eithersafe or required?

For example can two blocks fault with the same timestamp on differentaddresses?

Or from a different angle, is the clock granularity good enough to notcoalesce two separate faults to a single timestamp?


                return true;

+       adev->gmc.processed_fault_timestamp = MAX(timestamp,
adev->gmc.processed_fault_timestamp); +

Doesn't a plain assign work here? The if above has already verified newtimestamp is larger than the old.


Regards,

Tvrtko


        /* If we don't have space left in the ring buffer return

immediately */

        stamp = max(timestamp, AMDGPU_GMC_FAULT_TIMEOUT + 1) -
        
                AMDGPU_GMC_FAULT_TIMEOUT;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index
676e3aaa1f27..77eb15380284 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -361,6 +361,7 @@ struct amdgpu_gmc {

        u64 noretry_flags;
        u64 init_pte_flags;

+       u64 processed_fault_timestamp;

        bool flush_tlb_needs_extra_type_0;
        bool flush_tlb_needs_extra_type_2;

Re: [PATCH 3/7] drm/amdgpu/gmc: Don't compare page fault timestamps with other interrupts

Reply via email to