On Fri, Jul 11, 2025 at 9:58 AM Tvrtko Ursulin <tvrtko.ursu...@igalia.com> wrote: > > > On 11/07/2025 14:39, Alex Deucher wrote: > > On Fri, Jul 11, 2025 at 9:22 AM Tvrtko Ursulin > > <tvrtko.ursu...@igalia.com> wrote: > >> > >> > >> On 11/07/2025 13:45, Christian König wrote: > >>> On 11.07.25 14:23, Tvrtko Ursulin wrote: > >>>> Commit > >>>> 94b1e028e15c ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks") > >>>> added a workaround which disables GFXOFF for the duration of the job > >>>> submit stage (with a 100ms trailing hysteresis). > >>>> > >>>> Empirically the GFXOFF disable/enable request can suffer from significant > >>>> latencies (2ms is easily seen) which are then inserted onto the > >>>> amdgpu_job_run() path, which slows down the CPU submission of ready jobs. > >>>> > >>>> 1) > >>>> If the premise of the GFXOFF workaround is to keep it disabled while the > >>>> SDMA engine is active, the current workaround achieves that only > >>>> partially, for submissions and jobs which take less than 100ms (the > >>>> GFXOFF > >>>> re-enable hysteresis), counting from the ring write phase, up to > >>>> completion. > >>>> > >>>> 2) > >>>> If disabling GFXOFF affects the GFX engine too, basing the workaround > >>>> solely on the SDMA activity creates, at minimum, a needless "chatter" on > >>>> the SMU communication channel. > >>> > >>> IIRC that is intentional. This "needless" chatter is what the workaround > >>> was all about. > >> > >> I tried to gather knowledge to how the hardware works from the comment > >> in sdma_v5_2_ring_begin_use(). Maybe I got it wrong so bear with me please. > >> > >> To try and explain my questions better. If the GFX ring/engine is busy > >> is there a point for SDMA to be requesting GFXOFF enable/disable? Or > >> maybe with diagrams... > >> > >> 1) > >> > >> SDMA: > >> > >> ring-write ring-commit job-execute job-done > >> gfxoff-off-req gfxoff-on-req >100ms -> gfxoff-on > >> > >> Was the workaround prematurely dropped in this case (aka is > >> ring->funcs->end_use() the right place to drop it from)? Probably > >> theoretical that a SDMA job takes more than 100ms but I am trying to > >> understand it all. > >> > > > > The firmware controls the power to a subset of the chip which contains > > both gfx and sdma. Normally the firmware dynamically powers up and > > down gfx transparently when doorbells come in or the engines go idle > > for either engine. amdgpu_gfx_off_ctrl() tells the firmware to allow > > or disallow gfxoff entry. So what this workaround does is disallow > > gfxoff (which results in gfx being powered up) before we touch SDMA. > > Once SDMA is active, we can allow gfxoff again as it will dynamically > > Hmm so it is "once" and not "while", as the comment says: > > /* SDMA 5.2.3 (RMB) FW doesn't seem to properly > * disallow GFXOFF in some cases leading to > * hangs in SDMA. Disallow GFXOFF while SDMA is active. > > ? > > And for "once active" amdgpu_ring_commit() is what it counts?
Yes, amdgpu_ring_commit() rings the doorbell and at that point the engine starts running and SDMA is active. > > > be disabled once GFX/SDMA is no longer active. In this particular > > case there was a race condition somewhere in the internal handshaking > > with SDMA which led to SDMA missing doorbells sometimes and not > > executing the job even if there was work in the ring. > > Thank you, more or less than what I assumed. > > But in this case there should be no harm in holding GFXOFF disabled > until the job completes (like this patch)? Only a win to avoid the SMU > communication latencies while unit is powered on anyway. The extra latency is only on the CPU side, once the amdgpu_ring_commit() is called the SDMA engine is already working. Plus, the sooner you allow gfxoff again, the sooner it can start kicking in again. Alex > > Regards, > > Tvrtko > > >> 2) > >> > >> > >> GFX: > >> > >> +----- job executing --------------------------------------+ > >> > >> SDMA: > >> > >> ring-write ring-commit job-execute job-done > >> gfxoff-off-req gfxoff-on-req >100ms -> gfxoff-on > >> > >> > >> Is it required for the SDMA activity to cause SMU message traffic in > >> this case, or is the powerdomain implied to be on (GFXOFF cannot turn it > >> off while GFX is active)? > >> > >> This is the case I measured latency spikes. While the GFX load was > >> running I was seeing SDMA->run_job() spike and traced it to the SMU > >> communication. > >> > >> Hence the idea from the patch - prevent adev->gfx.gfx_off_req_count > >> dropping to zero until both GFX and SDMA are idle. > >> > >> https://imgshare.cc/rdxu2bjl > >> > >> Above is visual representation of these latencies. Y is SDMA run_job() > >> duration in micro-seconds, X is seconds wall time. Blue is stock kernel, > >> orange is with this patch. X goes for ~60 seconds, which is how long > >> Cyberpunk 2077 benchmark is. > >> > >> Regards, > >> > >> Tvrtko > >> > >>>> If 1) and 2) hold true, we can improve on the workaround by; a) only > >>>> re-enabling GFXOFF once the job had actually completed*, and b) apply the > >>>> same workaround on other rings which share the same GFXOFF powergating > >>>> domain. > >>> > >>> The point of GFXOFF is to turn GFX on/off *without* kernel driver > >>> interaction. Otherwise we don't need it in the first place. > >>> > >>> We just have a hack for the SDMA because that moved into the GFXOFF > >>> domain with Navi and is broken on some HW generations IIRC. > >>> > >>>> > >>>> With these two applied, the GFXOFF re-enable requests are avoided > >>>> altogether during persistent activity on the GFX ring and simultaneous > >>>> sporadic activity on the SDMA ring. > >>>> > >>>> This has a positive effect of drastically reducing SDMA submission > >>>> latencies. For example during the Cyberpunk 2077 benchmark, they are > >>>> reduced from an average of 64us (stdev 60) to 9us (stdev 6). Or more > >>>> importantly the worst case latency, averaged to a one second window, is > >>>> reduced from 305us to 30us**. > >>>> > >>>> *) For ease of implementation we put the re-enable at the job free stage, > >>>> since doing it on actual completion is problematic in terms of locking. > >>> > >>> Absolutely clear NAK to this. Never ever base anything on the job > >>> livetime! > >>> > >>> We already had enough trouble with that. > >>> > >>>> > >>>> **) Submission latency ewma averaged (DECLARE_EWMA(latency, 6, 4)) - > >>>> Approximately 30 SDMA submissions per second, ewma average logged once > >>>> per second therefore significantly hides the worst case latency. Eg. > >>>> the real improvement in max submission latency is severely understated by > >>>> these numbers. > >>> > >>> Well that would indeed be quite nice to have. > >>> > >>> Regards, > >>> Christian. > >>> > >>>> > >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com> > >>>> References: 94b1e028e15c ("drm/amdgpu/sdma5.2: add begin/end_use ring > >>>> callbacks") > >>>> Cc: Mario Limonciello <mario.limoncie...@amd.com> > >>>> Cc: Christian König <christian.koe...@amd.com> > >>>> Cc: Alex Deucher <alexander.deuc...@amd.com> > >>>> --- > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 + > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 8 ++++++++ > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 +++++++ > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 ++ > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 +++ > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + > >>>> drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 1 + > >>>> 7 files changed, 23 insertions(+) > >>>> > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > >>>> index 08f268dab8f5..eee40f385793 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h > >>>> @@ -475,6 +475,7 @@ struct amdgpu_gfx { > >>>> uint32_t compute_supported_reset; > >>>> > >>>> /* gfx off */ > >>>> + bool gfx_off_held; /* true: rings > >>>> hold gfx_off */ > >>>> bool gfx_off_state; /* true: > >>>> enabled, false: disabled */ > >>>> struct mutex gfx_off_mutex; /* mutex to > >>>> change gfxoff state */ > >>>> uint32_t gfx_off_req_count; /* default 1, > >>>> enable gfx off: dec 1, disable gfx off: add 1 */ > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > >>>> index 206b70acb29a..bf9bffe40235 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > >>>> @@ -191,6 +191,14 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, > >>>> unsigned int num_ibs, > >>>> return r; > >>>> } > >>>> > >>>> + if (job && adev->gfx.gfx_off_held && > >>>> + (ring->funcs->type == AMDGPU_RING_TYPE_GFX || > >>>> + ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE || > >>>> + ring->funcs->type == AMDGPU_RING_TYPE_SDMA)) { > >>>> + amdgpu_gfx_off_ctrl(adev, false); > >>>> + job->gfx_off_held = true; > >>>> + } > >>>> + > >>>> need_ctx_switch = ring->current_ctx != fence_ctx; > >>>> if (ring->funcs->emit_pipeline_sync && job && > >>>> ((tmp = amdgpu_sync_get_fence(&job->explicit_sync)) || > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>>> index 2b58e353cca1..4cfd175ac6df 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > >>>> @@ -191,6 +191,7 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, > >>>> struct amdgpu_vm *vm, > >>>> if (!*job) > >>>> return -ENOMEM; > >>>> > >>>> + (*job)->adev = adev; > >>>> (*job)->vm = vm; > >>>> > >>>> amdgpu_sync_create(&(*job)->explicit_sync); > >>>> @@ -268,6 +269,9 @@ static void amdgpu_job_free_cb(struct drm_sched_job > >>>> *s_job) > >>>> > >>>> amdgpu_sync_free(&job->explicit_sync); > >>>> > >>>> + if (job->gfx_off_held) > >>>> + amdgpu_gfx_off_ctrl(job->adev, true); > >>>> + > >>> > >>> > >>> > >>> > >>>> /* only put the hw fence if has embedded fence */ > >>>> if (!job->hw_fence.base.ops) > >>>> kfree(job); > >>>> @@ -301,6 +305,9 @@ void amdgpu_job_free(struct amdgpu_job *job) > >>>> if (job->gang_submit != &job->base.s_fence->scheduled) > >>>> dma_fence_put(job->gang_submit); > >>>> > >>>> + if (job->gfx_off_held) > >>>> + amdgpu_gfx_off_ctrl(job->adev, true); > >>>> + > >>>> if (!job->hw_fence.base.ops) > >>>> kfree(job); > >>>> else > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > >>>> index 2f302266662b..d4ab832ac193 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h > >>>> @@ -46,6 +46,7 @@ enum amdgpu_ib_pool_type; > >>>> > >>>> struct amdgpu_job { > >>>> struct drm_sched_job base; > >>>> + struct amdgpu_device *adev; > >>>> struct amdgpu_vm *vm; > >>>> struct amdgpu_sync explicit_sync; > >>>> struct amdgpu_fence hw_fence; > >>>> @@ -55,6 +56,7 @@ struct amdgpu_job { > >>>> bool vm_needs_flush; > >>>> bool gds_switch_needed; > >>>> bool spm_update_needed; > >>>> + bool gfx_off_held; > >>>> uint64_t vm_pd_addr; > >>>> unsigned vmid; > >>>> unsigned pasid; > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > >>>> index 426834806fbf..22cac94e2f2a 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > >>>> @@ -350,6 +350,9 @@ int amdgpu_ring_init(struct amdgpu_device *adev, > >>>> struct amdgpu_ring *ring, > >>>> ring->max_dw = max_dw; > >>>> ring->hw_prio = hw_prio; > >>>> > >>>> + if (ring->funcs->gfx_off_held) > >>>> + adev->gfx.gfx_off_held = true; > >>>> + > >>>> if (!ring->no_scheduler && ring->funcs->type < AMDGPU_HW_IP_NUM) { > >>>> hw_ip = ring->funcs->type; > >>>> num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds; > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > >>>> index 784ba2ec354c..afaf951b0b78 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > >>>> @@ -202,6 +202,7 @@ struct amdgpu_ring_funcs { > >>>> bool support_64bit_ptrs; > >>>> bool no_user_fence; > >>>> bool secure_submission_supported; > >>>> + bool gfx_off_held; > >>>> unsigned extra_dw; > >>>> > >>>> /* ring read/write ptr handling */ > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > >>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > >>>> index 42a25150f83a..c88de65e82bc 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c > >>>> @@ -1944,6 +1944,7 @@ static const struct amdgpu_ring_funcs > >>>> sdma_v5_2_ring_funcs = { > >>>> .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), > >>>> .support_64bit_ptrs = true, > >>>> .secure_submission_supported = true, > >>>> + .gfx_off_held = true, > >>>> .get_rptr = sdma_v5_2_ring_get_rptr, > >>>> .get_wptr = sdma_v5_2_ring_get_wptr, > >>>> .set_wptr = sdma_v5_2_ring_set_wptr, > >>> > >> >