On Mon, Aug 18, 2025 at 3:23 PM David Wu <david...@amd.com> wrote: > > Hi Lijo and Alex, > > I prefer Lijo's method as it should not cause unexpected power profile change > for the active session. > Liji, could you make some adjustments for your patch such as removing extra > blank line and function > declaration such as using "void" instead of "int" for the new functions > (should they be static?)
I still don't understand how you can get a case where the profile would change for the active session with my patch. That said, I think Lijo's patch is nicer because it avoids the extra atomic. Alex > > Thanks, > > David > > On 2025-08-14 14:42, David Wu wrote: > > hmm.. it is my concern for the same instance. but I got it now. Your patch is > good. > Thanks, > David > On 2025-08-14 14:06, Lazar, Lijo wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > I didn't fully understand the question. > > For the same instance, begin_thread will set the power state only after > cancelling any idle worker for the instance. If idle worker is already under > progress, then it needs to complete as that's a cancel_sync (it's the > existing logic). > > Basically, by the time begin_thread sets the PG state, no idle worker for the > same vcn instance would be active. If it's about context switch to another > vcn instance's begin_thread, I think that won't be a problem. > > Thanks, > Lijo > ________________________________ > From: Wu, David <david....@amd.com> > Sent: Thursday, August 14, 2025 11:14:26 PM > To: Lazar, Lijo <lijo.la...@amd.com>; Sundararaju, Sathishkumar > <sathishkumar.sundarar...@amd.com>; Alex Deucher <alexdeuc...@gmail.com> > Cc: Wu, David <david....@amd.com>; Deucher, Alexander > <alexander.deuc...@amd.com>; amd-gfx@lists.freedesktop.org > <amd-gfx@lists.freedesktop.org> > Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition (v3) > > amdgpu_vcn_idle_work_handler(): > if (!fences && !atomic_read(&vcn_inst->total_submission_cnt)) { > ----------- could it be possible a context switch here to > amdgpu_vcn_ring_begin_use()? > if it could then AMD_PG_STATE_GATE will be set by mistake. > > David > > On 2025-08-14 08:54, Lazar, Lijo wrote: > > [Public] > > > > The request profile can be moved outside the pg_lock in begin_use as in the > > attached patch. It needs set power state -> set profile order. > > > > This is the premise - > > > > Let's say there are two threads, begin_use thread and idle_work threads. > > begin_use and idle_work will need the workprofile mutex to request a > > profile. > > > > Case 1) Idle thread gets the lock first. > > a) Idle thread sees vinst power state as PG_UNGATE, no harm done. > > It exits without requesting power profile change. begin_use thread gets the > > lock next, it sees profile as active and continues. > > b) Idle thread sees vinst power state as PG_GATE, it will make > > workprofile_active to false and exit. Now when begin_use thread gets the > > mutex next, it's guaranteed to see the workprofile_active as false, hence > > it will request the profile. > > > > Case 2) begin_use thread gets the lock first. > > a) Workload profile is active, hence it doesn't do anything and > > exits. The change made by begin_use thread to vinst power state (state = > > on) will now be visible to idle thread which gets the lock next. It will do > > nothing and exit. > > b) Workload profile is inactive, hence it requests a profile > > change. Again, the change made by begin_use thread to vinst power state > > will now be visible to idle thread which gets the lock next. It will do > > nothing and exit. > > > > Thanks, > > Lijo > > > > -----Original Message----- > > From: Sundararaju, Sathishkumar <sathishkumar.sundarar...@amd.com> > > Sent: Thursday, August 14, 2025 6:18 PM > > To: Lazar, Lijo <lijo.la...@amd.com>; Alex Deucher <alexdeuc...@gmail.com> > > Cc: Wu, David <david....@amd.com>; Deucher, Alexander > > <alexander.deuc...@amd.com>; amd-gfx@lists.freedesktop.org > > Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition (v3) > > > > > > On 8/14/2025 5:33 PM, Lazar, Lijo wrote: > >> [Public] > >> > >> There is no need for nested lock. It only needs to follow the order > >> set instance power_state > >> set profile (this takes a global lock and hence instance power > >> state will be visible to whichever thread that gets the work profile lock). > >> > >> You are seeing nested lock just because I added the code just after power > >> state setting. > > Pasting your code from the file for ref : > > > > @@ -464,32 +509,14 @@ void amdgpu_vcn_ring_begin_use(struct amdgpu_ring > > *ring) > > > > -pg_lock: > > > > mutex_lock(&vcn_inst->vcn_pg_lock); > > vcn_inst->set_pg_state(vcn_inst, AMD_PG_STATE_UNGATE); > > > > + amdgpu_vcn_get_profile(adev); > > > > vcn_pg_lock isn't released here yet right ? And in-case you intend to only > > order the locks, then still there is an un-necessary OFF followed by ON, > > but yes that is acceptable, > > > > May be you want to move that vcn_pg_lock after amdgpu_vcn_get_profile to > > protect concurrent dpg_state access in begin_use. > > > > The concern is, this patch access power_state that is protected by some > > other mutex lock hoping it reflects right values also when holding > > powerprofile_lock. > > > > Or > > > > Have shared a patch with global workload_profile_mutex that simplifies this > > handling, and renamed pg_lock -> dpg_lock and used > > > > that only for dpg_state changes per instance. > > > > Regards, > > > > Sathish > > > >> Thanks, > >> Lijo > >> > >> -----Original Message----- > >> From: Sundararaju, Sathishkumar <sathishkumar.sundarar...@amd.com> > >> Sent: Thursday, August 14, 2025 5:23 PM > >> To: Lazar, Lijo <lijo.la...@amd.com>; Alex Deucher > >> <alexdeuc...@gmail.com> > >> Cc: Wu, David <david....@amd.com>; Deucher, Alexander > >> <alexander.deuc...@amd.com>; amd-gfx@lists.freedesktop.org > >> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition > >> (v3) > >> > >> > >> On 8/14/2025 3:16 PM, Lazar, Lijo wrote: > >>> [Public] > >>> > >>> I see your point now. Attached should work, I guess. Is the concern more > >>> about having to take the lock for every begin? > >> This is closer, but the thing is, IMO we shouldn't have to use 2 locks > >> and go into nested locking, we can do with just one global lock. > >> > >> Power_state of each instance, and global workload_profile_active are > >> inter-related, they need to be guarded together, > >> > >> nested could work , but why nested if single lock is enough ? nested > >> complicates it. > >> > >> Regards, > >> > >> Sathish > >> > >>> Thanks, > >>> Lijo > >>> > >>> -----Original Message----- > >>> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of > >>> Lazar, Lijo > >>> Sent: Thursday, August 14, 2025 2:55 PM > >>> To: Sundararaju, Sathishkumar <sathishkumar.sundarar...@amd.com>; > >>> Alex Deucher <alexdeuc...@gmail.com> > >>> Cc: Wu, David <david....@amd.com>; Deucher, Alexander > >>> <alexander.deuc...@amd.com>; amd-gfx@lists.freedesktop.org > >>> Subject: RE: [PATCH] drm/amdgpu/vcn: fix video profile race condition > >>> (v3) > >>> > >>> [Public] > >>> > >>> That is not required I think. The power profile is set by an instance > >>> *after* setting itself to power on. Also, it's switched back after > >>> changing its power state to off. If idle worker is run by another > >>> instance, it won't be seeing the inst0 as power gated and won't change > >>> power profile. > >>> > >>> Thanks, > >>> Lijo > >>> -----Original Message----- > >>> From: Sundararaju, Sathishkumar <sathishkumar.sundarar...@amd.com> > >>> Sent: Thursday, August 14, 2025 2:41 PM > >>> To: Lazar, Lijo <lijo.la...@amd.com>; Alex Deucher > >>> <alexdeuc...@gmail.com> > >>> Cc: Wu, David <david....@amd.com>; Deucher, Alexander > >>> <alexander.deuc...@amd.com>; amd-gfx@lists.freedesktop.org > >>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition > >>> (v3) > >>> > >>> Hi Lijo, > >>> > >>> On 8/14/2025 2:11 PM, Lazar, Lijo wrote: > >>>> [Public] > >>>> > >>>> We already have a per instance power state that can be tracked. What > >>>> about something like attached? > >>> This also has concurrent access of the power state , > >>> vcn.inst[i].cur_state is not protected by workload_profile_mutex > >>> > >>> every where, it can still change while you are holding > >>> workload_profile_mutex and checking it. > >>> > >>> Regards, > >>> > >>> Sathish > >>> > >>>> Thanks, > >>>> Lijo > >>>> -----Original Message----- > >>>> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of > >>>> Sundararaju, Sathishkumar > >>>> Sent: Thursday, August 14, 2025 4:43 AM > >>>> To: Alex Deucher <alexdeuc...@gmail.com> > >>>> Cc: Wu, David <david....@amd.com>; Deucher, Alexander > >>>> <alexander.deuc...@amd.com>; amd-gfx@lists.freedesktop.org > >>>> Subject: Re: [PATCH] drm/amdgpu/vcn: fix video profile race > >>>> condition > >>>> (v3) > >>>> > >>>> > >>>> On 8/14/2025 3:38 AM, Alex Deucher wrote: > >>>>> On Wed, Aug 13, 2025 at 5:1 PM Sundararaju, Sathishkumar > >>>>> <sathishkumar.sundarar...@amd.com> wrote: > >>>>>> On 8/14/2025 2:33 AM, Alex Deucher wrote: > >>>>>>> On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar > >>>>>>> <sathishkumar.sundarar...@amd.com> wrote: > >>>>>>>> On 8/14/2025 1:35 AM, Alex Deucher wrote: > >>>>>>>>> On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar > >>>>>>>>> <sathishkumar.sundarar...@amd.com> wrote: > >>>>>>>>>> Hi Alex, Hi David, > >>>>>>>>>> > >>>>>>>>>> I see David's concern but his suggestion yet wont solve the > >>>>>>>>>> problem, neither the current form , reason :- > >>>>>>>>>> > >>>>>>>>>> The emitted fence count and total submission count are fast > >>>>>>>>>> transients which frequently become 0 in between video decodes > >>>>>>>>>> (between jobs) even with the atomics and locks there can be a > >>>>>>>>>> switch of video power profile, in the current form of patch > >>>>>>>>>> that window is minimized, but still can happen if stress > >>>>>>>>>> tested. But power state of any instance becoming zero > >>>>>>>>> Can you explain how this can happen? I'm not seeing it. > >>>>>>>> Consider this situation, inst0 and inst1 actively decoding, > >>>>>>>> inst0 decode completes, delayed idle work starts. > >>>>>>>> inst0 idle handler can read 0 total fences and 0 total > >>>>>>>> submission count, even if inst1 is actively decoding, that's between > >>>>>>>> the jobs, > >>>>>>>> - as begin_use increaments vcn.total_submission_cnt and > >>>>>>>> end_use decreaments vcn.total_submission_cnt that can be 0. > >>>>>>>> - if outstanding fences are cleared and no new emitted > >>>>>>>> fence, between jobs , can be 0. > >>>>>>>> - both of the above conditions do not mean video decode > >>>>>>>> is complete on inst1, it is actively decoding. > >>>>>>> How can there be active decoding without an outstanding fence? > >>>>>>> In that case, total_fences (fences from both instances) would be > >>>>>>> non-0. > >>>>>> I mean on inst1 the job scheduled is already complete, so 0 > >>>>>> outstanding fences, newer job is yet to be scheduled > >>>>>> > >>>>>> and commited to ring (inst1) , this doesn't mean decode has > >>>>>> stopped on > >>>>>> inst1 right (I am saying if timing of inst0 idle work coincides > >>>>>> with this), > >>>>>> > >>>>>> Or am I wrong in assuming this ? Can't this ever happen ? Please > >>>>>> correct my understanding here. > >>>>> The flow looks like: > >>>>> > >>>>> begin_use(inst) > >>>>> emit_fence(inst) > >>>>> end_use(inst) > >>>>> > >>>>> ...later > >>>>> fence signals > >>>>> ...later > >>>>> work handler > >>>>> > >>>>> In begin_use we increment the global and per instance submission. > >>>>> This protects the power gating and profile until end_use. In end > >>>>> use we decrement the submissions because we don't need to protect > >>>>> anything any more as we have the fence that was submitted via the > >>>>> ring. That fence won't signal until the job is complete. > >>>> Is a next begin_use always guaranteed to be run before current job fence > >>>> signals ? > >>>> > >>>> if not then both total submission and total fence are zero , example > >>>> delayed job/packet submissions > >>>> > >>>> from user space, or next job schedule happens after current job fence > >>>> signals. > >>>> > >>>> if this is never possible then (v3) is perfect. > >>>> > >>>> Regards, > >>>> > >>>> Sathish > >>>> > >>>>> For power gating, we > >>>>> only care about the submission count and fences for that instance, > >>>>> for the profile, we care about submission count and fences all > >>>>> instances. > >>>>> Once the fences have signalled, the outstanding fences will be 0 > >>>>> and there won't be any active work. > >>>>> > >>>>> Alex > >>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Sathish > >>>>>> > >>>>>>> Alex > >>>>>>> > >>>>>>>> Whereas if instances are powered off we are sure idle time is > >>>>>>>> past and it is powered off, no possible way of active video > >>>>>>>> decode, when all instances are off we can safely assume no > >>>>>>>> active decode and global lock protects it against new begin_use on > >>>>>>>> any instance. > >>>>>>>> But the only distant concern is global common locks w.r.t perf, > >>>>>>>> but we are already having a global workprofile mutex , so there > >>>>>>>> shouldn't be any drop in perf, with just one single global lock > >>>>>>>> for all instances. > >>>>>>>> > >>>>>>>> Just sending out a patch with this fix, will leave it to you to > >>>>>>>> decide the right method. If you think outstanding total fences > >>>>>>>> can never be 0 during decode, then your previous version (v3) > >>>>>>>> itself is good, there is no real benefit of splitting the handlers > >>>>>>>> as such. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Sathish > >>>>>>>>> If it is possible, maybe it would be easier to just split the > >>>>>>>>> profile and powergating into separate handlers. The profile > >>>>>>>>> one would be global and the powergating one would be per instance. > >>>>>>>>> See the attached patches. > >>>>>>>>> > >>>>>>>>> Alex > >>>>>>>>> > >>>>>>>>>> can be a sure shot indication of break in a video decode, the > >>>>>>>>>> mistake in my patch was using per instance mutex, I should > >>>>>>>>>> have used a common global mutex, then that covers the situation > >>>>>>>>>> David is trying to bring out. > >>>>>>>>>> > >>>>>>>>>> Using one global vcn.pg_lock for idle and begin_use and using > >>>>>>>>>> flags to track power state could help us totally avoid this > >>>>>>>>>> situation. > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> > >>>>>>>>>> Sathish > >>>>>>>>>> > >>>>>>>>>> On 8/13/2025 11:46 PM, Wu, David wrote: > >>>>>>>>>>> On 8/13/2025 12:51 PM, Alex Deucher wrote: > >>>>>>>>>>>> On Wed, Aug 13, 2025 at 12:39 PM Wu, David <david...@amd.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> Hi Alex, > >>>>>>>>>>>>> > >>>>>>>>>>>>> The addition of total_submission_cnt should work - in that > >>>>>>>>>>>>> it is unlikely to have a context switch right after the > >>>>>>>>>>>>> begin_use(). > >>>>>>>>>>>>> The suggestion of moving it inside the lock (which I prefer > >>>>>>>>>>>>> in case someone adds more before the lock and not reviewed > >>>>>>>>>>>>> thoroughly) > >>>>>>>>>>>>> - up to you to decide. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Reviewed-by: David (Ming Qiang) Wu <david....@amd.com> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> David > >>>>>>>>>>>>> On 8/13/2025 9:45 AM, Alex Deucher wrote: > >>>>>>>>>>>>>> If there are multiple instances of the VCN running, we may > >>>>>>>>>>>>>> end up switching the video profile while another instance > >>>>>>>>>>>>>> is active because we only take into account the current > >>>>>>>>>>>>>> instance's submissions. Look at all outstanding fences > >>>>>>>>>>>>>> for the video profile. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> v2: drop early exit in begin_use() > >>>>>>>>>>>>>> v3: handle possible race between begin_use() work handler > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Fixes: 3b669df92c85 ("drm/amdgpu/vcn: adjust workload > >>>>>>>>>>>>>> profile > >>>>>>>>>>>>>> handling") > >>>>>>>>>>>>>> Reviewed-by: Sathishkumar S > >>>>>>>>>>>>>> <sathishkumar.sundarar...@amd.com> (v1) > >>>>>>>>>>>>>> Signed-off-by: Alex Deucher <alexander.deuc...@amd.com> > >>>>>>>>>>>>>> --- > >>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 40 > >>>>>>>>>>>>>> ++++++++++++------------- > >>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 + > >>>>>>>>>>>>>> 2 files changed, 21 insertions(+), 20 > >>>>>>>>>>>>>> deletions(-) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > >>>>>>>>>>>>>> index 9a76e11d1c184..593c1ddf8819b 100644 > >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > >>>>>>>>>>>>>> @@ -415,19 +415,25 @@ static void > >>>>>>>>>>>>>> amdgpu_vcn_idle_work_handler(struct work_struct *work) > >>>>>>>>>>>>>> struct amdgpu_vcn_inst *vcn_inst = > >>>>>>>>>>>>>> container_of(work, struct > >>>>>>>>>>>>>> amdgpu_vcn_inst, idle_work.work); > >>>>>>>>>>>>>> struct amdgpu_device *adev = vcn_inst->adev; > >>>>>>>>>>>>>> - unsigned int fences = 0, fence[AMDGPU_MAX_VCN_INSTANCES] > >>>>>>>>>>>>>> = {0}; > >>>>>>>>>>>>>> - unsigned int i = vcn_inst->inst, j; > >>>>>>>>>>>>>> + unsigned int total_fences = 0, > >>>>>>>>>>>>>> fence[AMDGPU_MAX_VCN_INSTANCES] = {0}; > >>>>>>>>>>>>>> + unsigned int i, j; > >>>>>>>>>>>>>> int r = 0; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - if (adev->vcn.harvest_config & (1 << i)) > >>>>>>>>>>>>>> + if (adev->vcn.harvest_config & (1 << > >>>>>>>>>>>>>> + vcn_inst->inst)) > >>>>>>>>>>>>>> return; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - for (j = 0; j < adev->vcn.inst[i].num_enc_rings; ++j) > >>>>>>>>>>>>>> - fence[i] += > >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_enc[j]); > >>>>>>>>>>>>>> + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { > >>>>>>>>>>>>>> + struct amdgpu_vcn_inst *v = > >>>>>>>>>>>>>> + &adev->vcn.inst[i]; > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> + for (j = 0; j < v->num_enc_rings; ++j) > >>>>>>>>>>>>>> + fence[i] += > >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_enc[j]); > >>>>>>>>>>>>>> + fence[i] += > >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&v->ring_dec); > >>>>>>>>>>>>>> + total_fences += fence[i]; > >>>>>>>>>>>>>> + } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> /* Only set DPG pause for VCN3 or below, VCN4 > >>>>>>>>>>>>>> and above will be handled by FW */ > >>>>>>>>>>>>>> if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && > >>>>>>>>>>>>>> - !adev->vcn.inst[i].using_unified_queue) { > >>>>>>>>>>>>>> + !vcn_inst->using_unified_queue) { > >>>>>>>>>>>>>> struct dpg_pause_state new_state; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> if (fence[i] || @@ -436,18 +442,18 @@ > >>>>>>>>>>>>>> static void amdgpu_vcn_idle_work_handler(struct > >>>>>>>>>>>>>> work_struct > >>>>>>>>>>>>>> *work) > >>>>>>>>>>>>>> else > >>>>>>>>>>>>>> new_state.fw_based = > >>>>>>>>>>>>>> VCN_DPG_STATE__UNPAUSE; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - adev->vcn.inst[i].pause_dpg_mode(vcn_inst, > >>>>>>>>>>>>>> &new_state); > >>>>>>>>>>>>>> + vcn_inst->pause_dpg_mode(vcn_inst, > >>>>>>>>>>>>>> + &new_state); > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - fence[i] += > >>>>>>>>>>>>>> amdgpu_fence_count_emitted(&vcn_inst->ring_dec); > >>>>>>>>>>>>>> - fences += fence[i]; > >>>>>>>>>>>>>> - > >>>>>>>>>>>>>> - if (!fences && > >>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) { > >>>>>>>>>>>>>> + if (!fence[vcn_inst->inst] && > >>>>>>>>>>>>>> !atomic_read(&vcn_inst->total_submission_cnt)) { > >>>>>>>>>>>>>> + /* This is specific to this instance */ > >>>>>>>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock); > >>>>>>>>>>>>>> vcn_inst->set_pg_state(vcn_inst, > >>>>>>>>>>>>>> AMD_PG_STATE_GATE); > >>>>>>>>>>>>>> mutex_unlock(&vcn_inst->vcn_pg_lock); > >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex); > >>>>>>>>>>>>>> - if (adev->vcn.workload_profile_active) { > >>>>>>>>>>>>>> + /* This is global and depends on all VCN > >>>>>>>>>>>>>> instances */ > >>>>>>>>>>>>>> + if (adev->vcn.workload_profile_active && > >>>>>>>>>>>>>> !total_fences && > >>>>>>>>>>>>>> + !atomic_read(&adev->vcn.total_submission_cnt)) { > >>>>>>>>>>>>>> r = > >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev, > >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, false); > >>>>>>>>>>>>>> if (r) @@ -467,16 +473,10 @@ > >>>>>>>>>>>>>> void amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring) > >>>>>>>>>>>>>> int r = 0; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> atomic_inc(&vcn_inst->total_submission_cnt); > >>>>>>>>>>>>>> + atomic_inc(&adev->vcn.total_submission_cnt); > >>>>>>>>>>>>> move this addition down inside the mutex lock > >>>>>>>>>>>>>> cancel_delayed_work_sync(&vcn_inst->idle_work); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> - /* We can safely return early here because we've > >>>>>>>>>>>>>> cancelled the > >>>>>>>>>>>>>> - * the delayed work so there is no one else to set it to > >>>>>>>>>>>>>> false > >>>>>>>>>>>>>> - * and we don't care if someone else sets it to true. > >>>>>>>>>>>>>> - */ > >>>>>>>>>>>>>> - if (adev->vcn.workload_profile_active) > >>>>>>>>>>>>>> - goto pg_lock; > >>>>>>>>>>>>>> - > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> mutex_lock(&adev->vcn.workload_profile_mutex); > >>>>>>>>>>>>> move to here: > >>>>>>>>>>>>> atomic_inc(&adev->vcn.total_submission_cnt); > >>>>>>>>>>>>> I think this should work for multiple instances. > >>>>>>>>>>>> Why does this need to be protected by the mutex? > >>>>>>>>>>> hmm.. OK - no need and it is actually better before the mutex. > >>>>>>>>>>> David > >>>>>>>>>>>> Alex > >>>>>>>>>>>> > >>>>>>>>>>>>> David > >>>>>>>>>>>>>> if (!adev->vcn.workload_profile_active) { > >>>>>>>>>>>>>> r = > >>>>>>>>>>>>>> amdgpu_dpm_switch_power_profile(adev, > >>>>>>>>>>>>>> PP_SMC_POWER_PROFILE_VIDEO, @@ -487,7 +487,6 @@ void > >>>>>>>>>>>>>> amdgpu_vcn_ring_begin_use(struct amdgpu_ring *ring) > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> mutex_unlock(&adev->vcn.workload_profile_mutex); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -pg_lock: > >>>>>>>>>>>>>> mutex_lock(&vcn_inst->vcn_pg_lock); > >>>>>>>>>>>>>> vcn_inst->set_pg_state(vcn_inst, > >>>>>>>>>>>>>> AMD_PG_STATE_UNGATE); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> @@ -528,6 +527,7 @@ void amdgpu_vcn_ring_end_use(struct > >>>>>>>>>>>>>> amdgpu_ring > >>>>>>>>>>>>>> *ring) > >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submiss > >>>>>>>>>>>>>> i > >>>>>>>>>>>>>> o > >>>>>>>>>>>>>> n > >>>>>>>>>>>>>> _cnt); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> atomic_dec(&ring->adev->vcn.inst[ring->me].total_submissio > >>>>>>>>>>>>>> n > >>>>>>>>>>>>>> _ > >>>>>>>>>>>>>> c > >>>>>>>>>>>>>> nt); > >>>>>>>>>>>>>> + atomic_dec(&ring->adev->vcn.total_submission_cnt); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> schedule_delayed_work(&ring->adev->vcn.inst[ring->me].idle_work, > >>>>>>>>>>>>>> VCN_IDLE_TIMEOUT); diff > >>>>>>>>>>>>>> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h > >>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h > >>>>>>>>>>>>>> index b3fb1d0e43fc9..febc3ce8641ff 100644 > >>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h > >>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h > >>>>>>>>>>>>>> @@ -352,6 +352,7 @@ struct amdgpu_vcn { > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> uint16_t inst_mask; > >>>>>>>>>>>>>> uint8_t num_inst_per_aid; > >>>>>>>>>>>>>> + atomic_t total_submission_cnt; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> /* IP reg dump */ > >>>>>>>>>>>>>> uint32_t *ip_dump;