On 6/16/26 13:17, Timur Kristóf wrote: > On Tuesday, June 16, 2026 12:15:12 PM Central European Summer Time Christian > König wrote: ... >> The check was added with patch to make retry page faults more resilent to IH >> ring buffer overflow: commit 3c2d6ea27955cfac8590884d207353eece8c2cee >> Author: Philip Yang <[email protected]> >> Date: Thu Nov 18 15:24:55 2021 -0500 >> >> drm/amdgpu: handle IH ring1 overflow >> >> IH ring1 is used to process GPU retry fault, overflow is enabled to >> drain retry fault because we want receive other interrupts while >> handling retry fault to recover range. There is no overflow flag set >> when wptr pass rptr. Use timestamp of rptr and wptr to handle overflow >> and drain retry fault. >> >> If fault timestamp goes backward, the fault is filtered and should not >> be processed. Drain fault is finished if processed_timestamp is equal to >> or larger than checkpoint timestamp. >> >> Add amdgpu_ih_functions interface decode_iv_ts for different chips to >> get timestamp from IV entry with different iv size and timestamp offset. >> amdgpu_ih_decode_iv_ts_helper is used for vega10, vega20, navi10. >> >> Signed-off-by: Philip Yang <[email protected]> >> Reviewed-by: Felix Kuehling <[email protected]> >> Acked-by: Christian König <[email protected]> >> Signed-off-by: Alex Deucher <[email protected]> >> >> But as far as I can see the whole idea is completely broken. The timestamp >> can also go backward in case of a reset for example. >> >> So having this check like this is clearly a bad idea. >> >> What we could do in amdgpu_gmc_filter_faults() is to check some range for >> the timestamp, e.g. last seen timestamp (in amdgpu_gmc_filter_faults(), >> e.g. only faults) - value X is considered a duplicate caused by ring buffer >> wrap around. >> > > Hi, > > I agree that filtering based on timestamps is not a good idea in general, and > we should probably re-think it. I wish someone had thought of that in 2021 > when the above patch was accepted.
I remember that I brought up some concerns about that when we originally reviewed the patch, but it was quite an urgent fix (as always) so we accepted it in the end. > Maybe this patch should be dropped and the timestamp check should be removed > instead. What do you guys think about that? That's fine with me but adding Philip just in case we could run into regressions with that. Regards, Christian. > > Best regards, > Timur
