On Monday, June 15, 2026 4:32:23 PM Central European Summer Time Tvrtko 
Ursulin wrote:
> On 25/05/2026 12:45, Timur Kristóf wrote:
> > Different interrupts may have different timestamp sources,
> > which shouldn't be compared.
> > 
> > If we compare the timestamps of retry faults to timestamps
> > of other interrupts, it may result in all retry fault
> > interrupts being filtered out, because of the different
> > time stamp source.
> > 
> > This issue was observed on Strix Halo.
> > Solved by storing the timestamp of the last page fault interrupt.
> 

Hi,

> This one may require access to AMD docs to review. For example I am
> immediately curious as to how many different clock sources on a single
> IH there are

As far as I know there are various timestamp sources in the GPU and some 
interrupts use different ones. I am not aware of any documentation on this 
topic, unfortunately.

> how does that relate to the timestamp_src field

The timestamp_src field is set differently when the timestamp source is 
different. So, it could happen that we accidentally filter out all page faults 
when we shouldn't.

> and if there are indeed multiple clock domains should the patch perhaps be
> generalized to something like
> ih->processed_timestamp[entry->timestamp_src] or something?

For the context of this patch, I think it doesn't matter how many different 
kinds of time stamps there are. What's important is that we just shouldn't 
compare timestamps of page faults with time stamps of other interrupts.

As far as I see the timestamp doesn't really matter for other interrupts as we 
only use it to filter out page faults and nothing else.

Hope this helps,
Timur

> > ---
> > 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 ++++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 +
> >   2 files changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index
> > 13bec8461cde..52258f1341c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> > @@ -437,9 +437,12 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device
> > *adev,> 
> >     uint32_t hash;
> >     
> >     /* Stale retry fault if timestamp goes backward */
> > 
> > -   if (amdgpu_ih_ts_after(timestamp, ih->processed_timestamp))
> > +   if (timestamp == adev->gmc.processed_fault_timestamp ||
> > +           amdgpu_ih_ts_after(timestamp, adev-
>gmc.processed_fault_timestamp))
> > 
> >             return true;
> > 
> > +   adev->gmc.processed_fault_timestamp = MAX(timestamp,
> > adev->gmc.processed_fault_timestamp); +
> > 
> >     /* If we don't have space left in the ring buffer return 
immediately */
> >     stamp = max(timestamp, AMDGPU_GMC_FAULT_TIMEOUT + 1) -
> >     
> >             AMDGPU_GMC_FAULT_TIMEOUT;
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index
> > 676e3aaa1f27..77eb15380284 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
> > @@ -361,6 +361,7 @@ struct amdgpu_gmc {
> > 
> >     u64 noretry_flags;
> >     u64 init_pte_flags;
> > 
> > +   u64 processed_fault_timestamp;
> > 
> >     bool flush_tlb_needs_extra_type_0;
> >     bool flush_tlb_needs_extra_type_2;




Reply via email to