On 6/2/26 07:03, Runyu Xiao wrote:
> mes_v11_0_queue_init() resets ring->wptr_cpu_addr with a plain 32-bit
> store in the reset/suspend path even though the same carrier is
> accessed with atomic64_set()/atomic64_read() and support_64bit_ptrs is
> enabled.
>
> This is not just a missing atomic annotation. The MES queue write
> pointer is a shared 64-bit carrier, and *ring->wptr_cpu_addr = 0 only
> clears the low 32 bits. A later atomic64_read() can then observe stale
> high 32 bits instead of a real zeroed reset state.
>
> Use atomic64_set((atomic64_t *)ring->wptr_cpu_addr, 0) so the reset
> path updates the full 64-bit wptr with the same access family as the
> existing readers and writers.
>
> Build-tested by compiling mes_v11_0.o.
>
> No AMDGPU hardware was available for end-to-end runtime testing.
Clear NAK.
The atomic64_t cast hack is just something we did for older generations and is
not something which is necessary not should be done here.
What could be possible is that we need to use amdgpu_ring_set_wptr() here to
correctly distinct between queues with 32bit and 64bit wptrs.
Regards,
Christian.
>
> Fixes: d81d75c99936 ("drm/amdgpu/gfx11: enable kiq to map mes ring")
> Cc: [email protected]
> Signed-off-by: Runyu Xiao <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index a926a3307..e2f762c2e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -1308,7 +1308,7 @@ static int mes_v11_0_queue_init(struct amdgpu_device
> *adev,
>
> if ((pipe == AMDGPU_MES_SCHED_PIPE) &&
> (amdgpu_in_reset(adev) || adev->in_suspend)) {
> - *(ring->wptr_cpu_addr) = 0;
> + atomic64_set((atomic64_t *)ring->wptr_cpu_addr, 0);
> *(ring->rptr_cpu_addr) = 0;
> amdgpu_ring_clear_ring(ring);
> }
> --
> 2.34.1