On 6/16/26 13:17, Timur Kristóf wrote:
> On Tuesday, June 16, 2026 12:15:12 PM Central European Summer Time Christian 
> König wrote:
...
>> The check was added with patch to make retry page faults more resilent to IH
>> ring buffer overflow: commit 3c2d6ea27955cfac8590884d207353eece8c2cee
>> Author: Philip Yang <[email protected]>
>> Date:   Thu Nov 18 15:24:55 2021 -0500
>>
>>     drm/amdgpu: handle IH ring1 overflow
>>
>>     IH ring1 is used to process GPU retry fault, overflow is enabled to
>>     drain retry fault because we want receive other interrupts while
>>     handling retry fault to recover range. There is no overflow flag set
>>     when wptr pass rptr. Use timestamp of rptr and wptr to handle overflow
>>     and drain retry fault.
>>
>>     If fault timestamp goes backward, the fault is filtered and should not
>>     be processed. Drain fault is finished if processed_timestamp is equal to
>> or larger than checkpoint timestamp.
>>
>>     Add amdgpu_ih_functions interface decode_iv_ts for different chips to
>>     get timestamp from IV entry with different iv size and timestamp offset.
>> amdgpu_ih_decode_iv_ts_helper is used for vega10, vega20, navi10.
>>
>>     Signed-off-by: Philip Yang <[email protected]>
>>     Reviewed-by: Felix Kuehling <[email protected]>
>>     Acked-by: Christian König <[email protected]>
>>     Signed-off-by: Alex Deucher <[email protected]>
>>
>> But as far as I can see the whole idea is completely broken. The timestamp
>> can also go backward in case of a reset for example.
>>
>> So having this check like this is clearly a bad idea.
>>
>> What we could do in amdgpu_gmc_filter_faults() is to check some range for
>> the timestamp, e.g. last seen timestamp (in amdgpu_gmc_filter_faults(),
>> e.g. only faults) - value X is considered a duplicate caused by ring buffer
>> wrap around.
>>
> 
> Hi,
> 
> I agree that filtering based on timestamps is not a good idea in general, and 
> we should probably re-think it. I wish someone had thought of that in 2021 
> when the above patch was accepted.

I remember that I brought up some concerns about that when we originally 
reviewed the patch, but it was quite an urgent fix (as always) so we accepted 
it in the end.

> Maybe this patch should be dropped and the timestamp check should be removed 
> instead. What do you guys think about that?

That's fine with me but adding Philip just in case we could run into 
regressions with that.

Regards,
Christian.

> 
> Best regards,
> Timur

Reply via email to