On Wed, Aug 13, 2025 at 9:55 AM Sundararaju, Sathishkumar <sathishkumar.sundarar...@amd.com> wrote: > > > On 8/13/2025 7:15 PM, Alex Deucher wrote: > > On Tue, Aug 12, 2025 at 7:08 PM David Wu <david...@amd.com> wrote: > >> Hi Alex, > >> > >> still have a concern - the fence or submission_cnt could increase in > >> begin_use but idle handler > >> has finished counting it (as 0) so it could also power off vcn. > > Ok, I think that is possible. Will send an updated patch to handle > > that case as well. > > cancel_delayed_work_sync(&vcn_inst->idle_work) at the top of begin_use > covers this situation. > > So there is no idle handler for this instance anymore after this call > completes, but the additional checks are okay to have.
There are separate work handers for each instance. What could happen is that the instance 0 work handler is running when begin_use() is running on instance 1. The instance 1 begin_use() sees that the video profile is already enabled. The instance 0 work handler sees that total_fences is 0 and disables the video profile because the fence for instance 1 has not been emitted yet. It won't be emitted until after begin_use() completes. The total_submission_cnt covers that period of time (between begin_ring and end_ring until the actual fence is emitted). Alex > > Regards, > > Sathish > > > > > Alex > > > >> David > >> > >> On 2025-08-12 16:58, Alex Deucher wrote: > >>> If there are multiple instances of the VCN running, > >>> we may end up switching the video profile while another > >>> instance is active because we only take into account > >>> the current instance's submissions. Look at all > >>> outstanding fences for the video profile. > >>> > >>> v2: drop early exit in begin_use() > >>> > >>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload profile handling") > >>> Reviewed-by: Sathishkumar S <sathishkumar.sundarar...@amd.com> (v1) > >>> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 37 ++++++++++++------------- > >>> 1 file changed, 17 insertions(+), 20 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > >>> index 9a76e11d1c184..fd127e9461c89 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > >>> @@ -415,19 +415,25 @@ static void amdgpu_vcn_idle_work_handler(struct > >>> work_struct *work) > >>> struct amdgpu_vcn_inst *vcn_inst = > >>> container_of(work, struct amdgpu_vcn_inst, idle_work.work); > >>> struct amdgpu_device *adev = vcn_inst->adev; > >>> - unsigned int fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] = {0}; > >>> - unsigned int i = vcn_inst->inst, j; > >>> + unsigned int total_fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] = > >>> {0}; > >>> + unsigned int i, j; > >>> int r = 0; > >>> > >>> - if (adev->vcn.harvest_config & (1 << i)) > >>> + if (adev->vcn.harvest_config & (1 << vcn_inst->inst)) > >>> return; > >>> > >>> - for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j) > >>> - fence[i] += > >>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]); > >>> + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { > >>> + struct amdgpu_vcn_inst *v = &adev->vcn.inst[i]; > >>> + > >>> + for (j = 0; j < v->num_enc_rings; ++j) > >>> + fence[i] += > >>> amdgpu_fence_count_emitted(&v->ring_enc[j]); > >>> + fence[i] += amdgpu_fence_count_emitted(&v->ring_dec); > >>> + total_fences += fence[i]; > >>> + } > >>> > >>> /* Only set DPG pause for VCN3 or below, VCN4 and above will be > >>> handled by FW */ > >>> if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && > >>> - !adev->vcn.inst[i].using_unified_queue) { > >>> + !vcn_inst->using_unified_queue) { > >>> struct dpg_pause_state new_state; > >>> > >>> if (fence[i] || > >>> @@ -436,18 +442,17 @@ static void amdgpu_vcn_idle_work_handler(struct > >>> work_struct *work) > >>> else > >>> new_state.fw_based = VCN_DPG_STATE__UNPAUSE; > >>> > >>> - adev->vcn.inst[i].pause_dpg_mode(vcn_inst, &new_state); > >>> + vcn_inst->pause_dpg_mode(vcn_inst, &new_state); > >>> } > >>> > >>> - fence[i] += amdgpu_fence_count_emitted(&vcn_inst->ring_dec); > >>> - fences += fence[i]; > >>> - > >>> - if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) { > >>> + if (!fence[vcn_inst->inst] && > >>> !atomic_read(&vcn_inst->total_submission_cnt)) { > >>> + /* This is specific to this instance */ > >>> mutex_lock(&vcn_inst->vcn_pg_lock); > >>> vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_GATE); > >>> mutex_unlock(&vcn_inst->vcn_pg_lock); > >>> mutex_lock(&adev->vcn.workload_profile_mutex); > >>> - if (adev->vcn.workload_profile_active) { > >>> + /* This is global and depends on all VCN instances */ > >>> + if (adev->vcn.workload_profile_active && !total_fences) { > >>> r = amdgpu_dpm_switch_power_profile(adev, > >>> PP_SMC_POWER_PROFILE_VIDEO, > >>> false); > >>> if (r) > >>> @@ -470,13 +475,6 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring > >>> *ring) > >>> > >>> cancel_delayed_work_sync(&vcn_inst->idle_work); > >>> > >>> - /* We can safely return early here because we've cancelled the > >>> - * the delayed work so there is no one else to set it to false > >>> - * and we don't care if someone else sets it to true. > >>> - */ > >>> - if (adev->vcn.workload_profile_active) > >>> - goto pg_lock; > >>> - > >>> mutex_lock(&adev->vcn.workload_profile_mutex); > >>> if (!adev->vcn.workload_profile_active) { > >>> r = amdgpu_dpm_switch_power_profile(adev, > >>> PP_SMC_POWER_PROFILE_VIDEO, > >>> @@ -487,7 +485,6 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring > >>> *ring) > >>> } > >>> mutex_unlock(&adev->vcn.workload_profile_mutex); > >>> > >>> -pg_lock: > >>> mutex_lock(&vcn_inst->vcn_pg_lock); > >>> vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE); > >>>