On 10/29/25 3:08 PM, Geert Uytterhoeven wrote:
Hello Geert,
default_suspend_ok+0xb4/0x20c (P)
genpd_runtime_suspend+0x11c/0x4e0
__rpm_callback+0x44/0x1cc
rpm_callback+0x6c/0x78
rpm_suspend+0x108/0x564
pm_runtime_work+0xb8/0xbc
process_one_work+0x144/0x280
worker_thread+0x2c8/0x3d0
kthread+0x128/0x1e0
ret_from_fork+0x10/0x20
Code: aa1303e0 52800863 b0005661 912dc021 (f9402095)
---[ end trace 0000000000000000 ]---
This driver uses manual PM Domain handling for multiple PM Domains. In
my case, pvr_power_domains_fini() calls dev_pm_domain_detach() (twice),
which calls dev_pm_put_subsys_data(), and sets dev->power.subsys_data to
NULL when psd->refcount reaches zero.
Later/in parallel, default_suspend_ok() calls dev_gpd_data():
static inline struct generic_pm_domain_data *dev_gpd_data(struct
device *dev)
{
return to_gpd_data(dev->power.subsys_data->domain_data);
}
triggering the NULL pointer dereference. Depending on timing, it may
crash earlier or later in genpd_runtime_suspend(), or not crash at all
(initially, I saw it only with extra debug prints in the genpd subsystem).
I came to the same conclusion when revisiting it yesterday and today.
The power 3dg-{a,b} domains are in RPM_SUSPENDING state, the
__rpm_callback() is running and it unlocks dev->power.lock spinlock for
just long enough, that the pvr_power_domains_fini() can issue
dev_pm_domain_detach() and then dev_pm_put_subsys_data() , which unsets
subsys_data, which are later still used by the __rpm_callback() (really
the genpd_runtime_suspend() -> suspend_ok() it calls for this domain).
But, I wonder if the problem is actually in the CPG MSSR clock domain
driver. The pvr_power_domains_fini() dev_pm_domain_detach() really calls
cpg_mssr_detach_dev() which calls pm_clk_destroy() and that invokes the
dev_pm_domain_detach() which unsets the subsys_data . The
pm_clk_destroy() documentation is explicit about it unsetting the
subsys_data .
I wonder if what we need to do instead, is patch the CPG MSSR clock
domain driver such, that it would surely NOT call pm_clk_destroy()
before the domain transitioned from RPM_SUSPENDING -> RPM_SUSPENDED
state and surely is done with all its __rpm_callback() invocations ?
Can you please test this change and see if it fixes the problem ?
The barrier should guarantee that the domain is settled and no more
callbacks are still running.
"
diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c
b/drivers/clk/renesas/renesas-cpg-mssr.c
index 7f9b7aa397906..14158cab1b129 100644
--- a/drivers/clk/renesas/renesas-cpg-mssr.c
+++ b/drivers/clk/renesas/renesas-cpg-mssr.c
@@ -24,6 +24,7 @@
#include <linux/platform_device.h>
#include <linux/pm_clock.h>
#include <linux/pm_domain.h>
+#include <linux/pm_runtime.h>
#include <linux/psci.h>
#include <linux/reset-controller.h>
#include <linux/slab.h>
@@ -656,8 +657,10 @@ int cpg_mssr_attach_dev(struct generic_pm_domain
*unused, struct device *dev)
void cpg_mssr_detach_dev(struct generic_pm_domain *unused, struct
device *dev)
{
- if (!pm_clk_no_clocks(dev))
+ if (!pm_clk_no_clocks(dev)) {
+ pm_runtime_barrier(dev);
pm_clk_destroy(dev);
+ }
}
static void cpg_mssr_genpd_remove(void *data)
"