On 15/06/2026 15:52, Timur Kristóf wrote:
On Monday, June 15, 2026 4:32:23 PM Central European Summer Time Tvrtko
Ursulin wrote:
On 25/05/2026 12:45, Timur Kristóf wrote:
Different interrupts may have different timestamp sources,
which shouldn't be compared.
If we compare the timestamps of retry faults to timestamps
of other interrupts, it may result in all retry fault
interrupts being filtered out, because of the different
time stamp source.
This issue was observed on Strix Halo.
Solved by storing the timestamp of the last page fault interrupt.
Hi,
This one may require access to AMD docs to review. For example I am
immediately curious as to how many different clock sources on a single
IH there are
As far as I know there are various timestamp sources in the GPU and some
interrupts use different ones. I am not aware of any documentation on this
topic, unfortunately.
how does that relate to the timestamp_src field
The timestamp_src field is set differently when the timestamp source is
different. So, it could happen that we accidentally filter out all page faults
when we shouldn't.
and if there are indeed multiple clock domains should the patch perhaps be
generalized to something like
ih->processed_timestamp[entry->timestamp_src] or something?
For the context of this patch, I think it doesn't matter how many different
kinds of time stamps there are. What's important is that we just shouldn't
compare timestamps of page faults with time stamps of other interrupts.
True, thank you!
Another question is why the backward timestamp check is needed only for
fault interrupts? I do not see it elsewhere.
Let me also ask two more things below.
As far as I see the timestamp doesn't really matter for other interrupts as we
only use it to filter out page faults and nothing else.
Hope this helps,
Timur
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 ++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index
13bec8461cde..52258f1341c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -437,9 +437,12 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device
*adev,>
uint32_t hash;
/* Stale retry fault if timestamp goes backward */
- if (amdgpu_ih_ts_after(timestamp, ih->processed_timestamp))
+ if (timestamp == adev->gmc.processed_fault_timestamp ||
+ amdgpu_ih_ts_after(timestamp, adev-
gmc.processed_fault_timestamp))
First thing is whether you are confident the equality check is either
safe or required?
For example can two blocks fault with the same timestamp on different
addresses?
Or from a different angle, is the clock granularity good enough to not
coalesce two separate faults to a single timestamp?
return true;
+ adev->gmc.processed_fault_timestamp = MAX(timestamp,
adev->gmc.processed_fault_timestamp); +
Doesn't a plain assign work here? The if above has already verified new
timestamp is larger than the old.
Regards,
Tvrtko
/* If we don't have space left in the ring buffer return
immediately */
stamp = max(timestamp, AMDGPU_GMC_FAULT_TIMEOUT + 1) -
AMDGPU_GMC_FAULT_TIMEOUT;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index
676e3aaa1f27..77eb15380284 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -361,6 +361,7 @@ struct amdgpu_gmc {
u64 noretry_flags;
u64 init_pte_flags;
+ u64 processed_fault_timestamp;
bool flush_tlb_needs_extra_type_0;
bool flush_tlb_needs_extra_type_2;