On 10/20/2025 4:56 PM, Sultan Alsawaf wrote:
On Thu, Oct 16, 2025 at 01:55:27PM -0500, Mario Limonciello wrote:
[Why]
Newer VPE microcode has functionality that will decrease DPM level
only when a workload has run for 2 or more seconds.  If VPE is turned
off before this DPM decrease and the PMFW doesn't reset it when
power gating VPE, the SOC can get stuck with a higher DPM level.

This can happen from amdgpu's ring buffer test because it's a short
quick workload for VPE and VPE is turned off after 1s.

[How]
In idle handler besides checking fences are drained check PMFW version
to determine if it will reset DPM when power gating VPE.  If PMFW will
not do this, then check VPE DPM level. If it is not DPM0 reschedule
delayed work again until it is.

Cc: [email protected]
Reported-by: Sultan Alsawaf <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4615
Signed-off-by: Mario Limonciello <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 33 ++++++++++++++++++++++---
  1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
index 474bfe36c0c2..f4932339d79d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
@@ -322,6 +322,26 @@ static int vpe_early_init(struct amdgpu_ip_block *ip_block)
        return 0;
  }
+static bool vpe_need_dpm0_at_power_down(struct amdgpu_device *adev)
+{
+       switch (amdgpu_ip_version(adev, VPE_HWIP, 0)) {
+       case IP_VERSION(6, 1, 1):
+               return adev->pm.fw_version < 0x0a640500;
+       default:
+               return false;
+       }
+}
+
+static int vpe_get_dpm_level(struct amdgpu_device *adev)
+{
+       struct amdgpu_vpe *vpe = &adev->vpe;
+
+       if (!adev->pm.dpm_enabled)
+               return 0;
+
+       return RREG32(vpe_get_reg_offset(vpe, 0, vpe->regs.dpm_request_lv));
+}
+
  static void vpe_idle_work_handler(struct work_struct *work)
  {
        struct amdgpu_device *adev =
@@ -329,11 +349,16 @@ static void vpe_idle_work_handler(struct work_struct 
*work)
        unsigned int fences = 0;
fences += amdgpu_fence_count_emitted(&adev->vpe.ring);
+       if (fences)
+               goto reschedule;
- if (fences == 0)
-               amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VPE, AMD_PG_STATE_GATE);
-       else
-               schedule_delayed_work(&adev->vpe.idle_work, VPE_IDLE_TIMEOUT);
+       if (vpe_need_dpm0_at_power_down(adev) && vpe_get_dpm_level(adev) != 0)
+               goto reschedule;
+
+       amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VPE, 
AMD_PG_STATE_GATE);
+

Wait a second, there's no return here! My laptop kept getting kicked out of
S0i3 as a result when I'd suspend it, and I found my laptop cooking in my
backpack with its battery mostly drained. :-(

Oh gosh, whoops!  I'll get a fix on the list.

+reschedule:
+       schedule_delayed_work(&adev->vpe.idle_work, VPE_IDLE_TIMEOUT);
  }
static int vpe_common_init(struct amdgpu_vpe *vpe)
--
2.51.0


Sultan

Reply via email to