On 04/06/2026 18:35, Adrián Larumbe wrote: > During device probe(), failure to do a PM get() will leave the usage_count > set to 0, which is the value assigned at device creation time. That means > when the autosuspend delay expires, runtime suspend callback won't be > invoked, so the device will remain powered on forever. > > On top of that, failure to call PM put() during device unplug means > Panfrost device's PM usage_count increases monotonically for every new > module reload. > > The combined outcome of both of the above was that devfreq OPP transition > notifications would be printed all the time, even when no jobs are being > submitted. This quickly fills the kernel ring buffer with junk. > > Even direr than that was the fact MMU interrupts are only enabled when > the device is reset, so after device probe() the very first job targeting > the tiler heap BO would always time out, because the driver's PM runtime > resume callback would not be invoked. > > Signed-off-by: Adrián Larumbe <[email protected]> > Fixes: 635430797d3f ("drm/panfrost: Rework runtime PM initialization") > Fixes: 876b15d2c88d ("drm/panfrost: Fix module unload") > --- > drivers/gpu/drm/panfrost/panfrost_drv.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c > b/drivers/gpu/drm/panfrost/panfrost_drv.c > index 2d4b6aa95c66..545fbf2c8d0c 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c > @@ -989,6 +989,7 @@ static int panfrost_probe(struct platform_device *pdev) > pm_runtime_set_active(pfdev->base.dev); > pm_runtime_mark_last_busy(pfdev->base.dev); > pm_runtime_enable(pfdev->base.dev); > + pm_runtime_get_noresume(pfdev->base.dev); > pm_runtime_set_autosuspend_delay(pfdev->base.dev, 50); /* ~3 frames */ > pm_runtime_use_autosuspend(pfdev->base.dev); > > @@ -1000,10 +1001,12 @@ static int panfrost_probe(struct platform_device > *pdev) > if (err < 0) > goto err_out1; > > + pm_runtime_put_autosuspend(pfdev->base.dev); > > return 0; > > err_out1: > + pm_runtime_put_noidle(pfdev->base.dev); > pm_runtime_disable(pfdev->base.dev); > panfrost_device_fini(pfdev);
Sashiko is concerned that dropping the usage count before pm_runtime_disable() could cause things to turn off too early, and I have to agree it sounds like it could be a problem: Sashiko wrote: > Does dropping the usage count before pm_runtime_disable() create a race > condition where the suspend callback can run and disable clocks before > hardware shutdown? > Because the usage count is dropped early, a concurrent PM event could trigger > the suspend callback, disabling clocks. Then, panfrost_device_fini() calls > panfrost_gpu_fini() which writes to MMIO registers. Could writing to > unclocked registers on ARM SoCs cause fatal bus errors or panics? Sashiko also suggests we might have some other (partially pre-existing) issues here. https://sashiko.dev/#/patchset/20260604-claude-fixes-v2-0-57c6bd4c1655%40collabora.com Thanks, Steve > pm_runtime_set_suspended(pfdev->base.dev); > @@ -1018,8 +1021,9 @@ static void panfrost_remove(struct platform_device > *pdev) > drm_dev_unregister(&pfdev->base); > > pm_runtime_get_sync(pfdev->base.dev); > - pm_runtime_disable(pfdev->base.dev); > panfrost_device_fini(pfdev); > + pm_runtime_put_noidle(pfdev->base.dev); > + pm_runtime_disable(pfdev->base.dev); > pm_runtime_set_suspended(pfdev->base.dev); > } > >
