Hi Geert, On 29/10/2025 14:08, Geert Uytterhoeven wrote: > Hi all, > > While playing with the PowerVR driver on various R-Car SoCs, I ran into > a crash/race condition on Gray Hawk Single (R-Car V4M). After adding > the GPU device node to DTS, the driver fails to probe due to lack of > suitable firmware, as expected:
Thanks for the detailed report! I'll make time to look into this. Do you
encounter a similar issue on other R-Car platforms, or is this exclusive
to the V4M?
>
> powervr fd000000.gpu: Direct firmware load for
> powervr/rogue_36.53.104.796_v1.fw failed with error -2
> powervr fd000000.gpu: [drm] *ERROR* failed to load firmware
> powervr/rogue_36.53.104.796_v1.fw (err=-2)
>
> However, after that it crashes:
>
> Unable to handle kernel NULL pointer dereference at virtual
> address 0000000000000040
> Mem abort info:
> PM: GENPD_STATE_OFF
> ESR = 0x0000000096000004
> PM: sd_count 19
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x04: level 0 translation fault
> Data abort info:
> ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [0000000000000040] user address but active_mm is swapper
> Internal error: Oops: 0000000096000004 [#1] SMP
> CPU: 2 UID: 0 PID: 46 Comm: kworker/u16:2 Tainted: G W
> 6.18.0-rc3-arm64-renesas-04934-g585255656363-dirty #3296 PREEMPT
> Tainted: [W]=WARN
> Hardware name: Renesas Gray Hawk Single board based on r8a779h0 (DT)
> Workqueue: pm pm_runtime_work
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : default_suspend_ok+0xb4/0x20c
> lr : default_suspend_ok+0x9c/0x20c
> sp : ffff800081e23bc0
> x29: ffff800081e23bd0 x28: ffff800080e2d240 x27: ffff0004402d8bc0
> x26: ffff0004402e2900 x25: ffff000440b554e4 x24: ffff80008108bb70
> x23: 0000000000000001 x22: ffff80008108bb88 x21: ffff800080e2c568
> x20: ffff800080e2d258 x19: ffff000440b55400 x18: 0000000000000006
> x17: 2030683937376138 x16: 72206e6f20646573 x15: ffff800081e23510
> x14: 0000000000000000 x13: 3035616234383138 x12: 3030303866666666
> x11: 0000000000000533 x10: 000000000000005d x9 : 000000000001b64f
> x8 : 7f7f7f7f7f7f7f7f x7 : 205d373032323131 x6 : 0000000000000000
> x5 : 0000000000000030 x4 : 0000000000000000 x3 : 0000000000000043
> x2 : ffff800080e2d258 x1 : ffff80008108bb70 x0 : ffff000440b55400
> Call trace:
> default_suspend_ok+0xb4/0x20c (P)
> genpd_runtime_suspend+0x11c/0x4e0
> __rpm_callback+0x44/0x1cc
> rpm_callback+0x6c/0x78
> rpm_suspend+0x108/0x564
> pm_runtime_work+0xb8/0xbc
> process_one_work+0x144/0x280
> worker_thread+0x2c8/0x3d0
> kthread+0x128/0x1e0
> ret_from_fork+0x10/0x20
> Code: aa1303e0 52800863 b0005661 912dc021 (f9402095)
> ---[ end trace 0000000000000000 ]---
>
> This driver uses manual PM Domain handling for multiple PM Domains. In
> my case, pvr_power_domains_fini() calls dev_pm_domain_detach() (twice),
> which calls dev_pm_put_subsys_data(), and sets dev->power.subsys_data to
> NULL when psd->refcount reaches zero.
>
> Later/in parallel, default_suspend_ok() calls dev_gpd_data():
>
> static inline struct generic_pm_domain_data *dev_gpd_data(struct
> device *dev)
> {
> return to_gpd_data(dev->power.subsys_data->domain_data);
> }
>
> triggering the NULL pointer dereference. Depending on timing, it may
> crash earlier or later in genpd_runtime_suspend(), or not crash at all
> (initially, I saw it only with extra debug prints in the genpd subsystem).
>
> As the driver mixes automatic (devm_*()) and manual cleanup, my first
> guess was that devm_pm_runtime_enable() would keep Runtime PM enabled
> too long after the manual call to pvr_power_domains_fini(), but
> replacing that by manual cleanup:
Regardless of whether it fixes this issue, it feels like we should clean
this up and stick to one cleanup method driver-wide.
>
> --- a/drivers/gpu/drm/imagination/pvr_drv.c
> +++ b/drivers/gpu/drm/imagination/pvr_drv.c
> @@ -1424,7 +1424,7 @@ pvr_probe(struct platform_device *plat_dev)
> if (err)
> goto err_context_fini;
>
> - devm_pm_runtime_enable(&plat_dev->dev);
> + pm_runtime_enable(&plat_dev->dev);
> pm_runtime_mark_last_busy(&plat_dev->dev);
>
> pm_runtime_set_autosuspend_delay(&plat_dev->dev, 50);
> @@ -1450,6 +1450,9 @@ pvr_probe(struct platform_device *plat_dev)
> err_watchdog_fini:
> pvr_watchdog_fini(pvr_dev);
>
> + pm_runtime_dont_use_autosuspend(&plat_dev->dev);
> + pm_runtime_disable(&plat_dev->dev);
> +
> pvr_queue_device_fini(pvr_dev);
>
> err_context_fini:
> @@ -1475,6 +1478,8 @@ static void pvr_remove(struct
> platform_device *plat_dev)
> pvr_device_fini(pvr_dev);
> drm_dev_unplug(drm_dev);
> pvr_watchdog_fini(pvr_dev);
> + pm_runtime_dont_use_autosuspend(&plat_dev->dev);
> + pm_runtime_disable(&plat_dev->dev);
> pvr_queue_device_fini(pvr_dev);
> pvr_context_device_fini(pvr_dev);
> pvr_power_domains_fini(pvr_dev);
>
> did not fix the issue. Calling pm_runtime_force_suspend() instead of
> pm_runtime_dont_use_autosuspend() also didn't help.
>
> Do you have a clue?
Not yet, but we'll look into it :)
Cheers,
Matt
> Thanks!
>
> Gr{oetje,eeting}s,
>
> Geert
>
--
Matt Coster
E: [email protected]
OpenPGP_signature.asc
Description: OpenPGP digital signature
