vcn: fix video profile race condition (v3)

David Wu Thu, 14 Aug 2025 14:40:00 -0700

after discussion with Alex offline it is clear now as the power profileswitch is fine and should not

affect the user experience (such as mpv playback).

Alex, please correct me if my understanding is incorrect. This patch(v3) is also good.


Thanks,
David
On 2025-08-14 15:43, Alex Deucher wrote:

On Thu, Aug 14, 2025 at 3:18 PM David Wu <david...@amd.com> wrote:

On 2025-08-14 14:00, Alex Deucher wrote:

On Thu, Aug 14, 2025 at 12:44 PM David Wu <david...@amd.com> wrote:

On 2025-08-14 12:01, Alex Deucher wrote:

On Thu, Aug 14, 2025 at 11:35 AM David Wu <david...@amd.com> wrote:

On 2025-08-14 08:56, Alex Deucher wrote:

On Wed, Aug 13, 2025 at 7:06 PM Wu, David <david...@amd.com> wrote:

On 8/13/2025 6:11 PM, Alex Deucher wrote:

On Wed, Aug 13, 2025 at 5:47 PM Wu, David <david...@amd.com> wrote:

On 8/13/2025 5:03 PM, Alex Deucher wrote:

On Wed, Aug 13, 2025 at 4:58 PM Sundararaju, Sathishkumar
<sathishkumar.sundarar...@amd.com> wrote:

On 8/14/2025 1:35 AM, Alex Deucher wrote:

On Wed, Aug 13, 2025 at 2:23 PM Sundararaju, Sathishkumar
<sathishkumar.sundarar...@amd.com> wrote:

Hi Alex, Hi David,

I see David's concern but his suggestion yet wont solve the problem,
neither the current form , reason :-

The emitted fence count and total submission count are fast transients
which frequently become 0 in between video decodes (between jobs) even
with the atomics and locks there can be a switch of video power profile,
in the current form of patch that window is minimized, but still can
happen if stress tested. But power state of any instance becoming zero

Can you explain how this can happen?  I'm not seeing it.

Consider this situation, inst0 and inst1 actively decoding, inst0 decode
completes, delayed idle work starts.
inst0 idle handler can read 0 total fences and 0 total submission count,
even if inst1 is actively decoding,
that's between the jobs,
       - as begin_use increaments vcn.total_submission_cnt and end_use
decreaments vcn.total_submission_cnt that can be 0.
       - if outstanding fences are cleared and no new emitted fence, between
jobs , can be 0.
       - both of the above conditions do not mean video decode is complete on
inst1, it is actively decoding.

How can there be active decoding without an outstanding fence?  In
that case, total_fences (fences from both instances) would be non-0.

I think it should be non-0.
I do see a hiccup possible - i.e the power switching from ON to OFF then
ON in the
middle of decoding, i.e inst0 idle handler turns it off then inst1 turns
it on.

How would that happen? As long as there submission cnt is non-0 and
there are outstanding fences on any instance, the video profile will
stay active.

there could be no jobs but it doesn't timeout yet and new jobs will come in
any ms - note all fences are done at this time. The idle handler sees no
fences
and no jobs so it turns off the power - but just ms later a new job is
submitted
from the same decode session which could be mpv player as it does not
need to
submit jobs without delays. This will turn on the power.

I'm not following.  Every submission will start with begin_use().

yes - it does - it can power on vcn but this happens in the middle of a
decode session which has 10s timeout to call its own idle handler - in fact
the other instance's idle handler will power off vcn because it does not
know it needs
to wait until the decoding session times out.

I don't follow.  If there are no outstanding fences, there is no
reason to not power down the VCN instance and disable the video
profile. If there are still outstanding fences, then the VCN instance
those fences are associated with will stay on and the video profile
will stay enabled.  If the engine hangs and eventually gets reset, the
fence will be signalled and then there will be no outstanding fences
so the idle handler will eventually disable the power profile.  The
idle handler will keep getting rescheduled as long as there is still
oustanding work.

inst0 and inst1:
inst0 sends jobA, then ends jobA and no more job submitted in 500ms and
job queue is empty - at this point  inst1's idle handler sees no
outstanding fences/jobs
then power off.  However inst0 starts to submit job after 500ms - inst0'
idle handler
has not started/scheduled to run but inst1's has finished already which
does not know inst0 has not timed out or called its own idle handler.
This violates the
logic for idle handler's timeout condition. (i.e 10s timeout designed
but timed out in 500ms)
all this means it powered down too early for inst0.

I still don't follow.  Here's a sample flow.  Job comes in on inst 0
and then slightly later on inst 1.

Inst 0 job submission
Inst 0 calls begin_use().  This cancels the current inst worker
thread.  It enables the video profile and ungates the instance.
IBs and fence packets get submitted to instance 0 of the engine
Inst 0 calls end_use().  This schedules the worker thread for
VCN_IDLE_TIMEOUT jiffies in the future.

Inst 1 job submission:
Inst 1 calls begin_use().  This cancels the current inst worker
thread.  It sees the video profile is enabled and ungates the
instance.
IBs and fence packets get submitted to instance 1 of the engine
Inst 1 calls end_use().  This schedules the worker thread for
VCN_IDLE_TIMEOUT jiffies in the future.

inst 0 work hander runs.  Sees outstanding fences on inst 0; skips
powergating inst 0, skips disabling video profile. Schedules the
worker thread for VCN_IDLE_TIMEOUT jiffies in the future.

inst 0 IB completes and fence signals

inst 1 IB completes and fence signals

inst 1 work hander runs.  Sees no outstanding fences on inst 1.
powergates inst 1.  Check if there are any outstanding fences on other
instances.  Sees the no fences from inst 0 so disables the video
profile.

now there are jobs coming from inst0, so inst 0 idle handler won't run.

inst 0 work hander runs.  Sees no outstanding fences on inst 0.
powergates inst 0.  Check if there are any outstanding fences on other
instances.  Sees the no fences from inst 1, sees that video profile is
already disabled.

inst 0 work handler runs? could or could not - right? depends on if there are 
more jobs for inst0 and also
if  VCN_IDLE_TIMEOUT jiffies has passed for inst0. There is possibly a point in 
the sequence that inst0
stops submit jobs but its idle handler has not run yet. Should we wait until 
all instances have finished their idle handlers?

The work handler will run unless it is cancelled in begin_use(), but
in that case, it will get scheduled again in end_use().  If it runs,
but there is still outstanding work (fences), then it will get
scheduled to run again in the work handler.  All we are about from the
power and video profile perspective is whether or not there are still
outstanding fences.  Once all fences are completed on all instances,
we can disable the video profile regardless of whether then the idle
handlers have run or not because the hw is already idle.

if not then we will run into a power OFF(by inst1) -> ON(by inst0) for the 
active instance(inst0, expected ON). (active - I mean
those have not timed out in VCN_IDLE_TIMEOUT jiffies and still can submit jobs 
in any time.)
if this could happen then we powered off too early. (I did not say we cannot do 
it but it is not expected)

VCN_IDLE_TIMEOUT is not a timeout.  We could call it something else.
VCN_IDLE_CHECK_PERIOD is a better description.  It just runs in the
future to check if all of the fences are signalled.  If they are not
signalled, we schedule the worker to run again further in the future.
If you reduce the time, it will check more often, you can potentially
save more power, but you add more CPU overhead because the handler may
run more often.

Alex

You can insert additional job submissions anywhere you want in the timeline.

Alex

Re: [PATCH] drm/amdgpu/vcn: fix video profile race condition (v3)

Reply via email to