On 10/29/25 3:08 PM, Geert Uytterhoeven wrote:

Hello Geert,

      default_suspend_ok+0xb4/0x20c (P)
      genpd_runtime_suspend+0x11c/0x4e0
      __rpm_callback+0x44/0x1cc
      rpm_callback+0x6c/0x78
      rpm_suspend+0x108/0x564
      pm_runtime_work+0xb8/0xbc
      process_one_work+0x144/0x280
      worker_thread+0x2c8/0x3d0
      kthread+0x128/0x1e0
      ret_from_fork+0x10/0x20
     Code: aa1303e0 52800863 b0005661 912dc021 (f9402095)
     ---[ end trace 0000000000000000 ]---

This driver uses manual PM Domain handling for multiple PM Domains.  In
my case, pvr_power_domains_fini() calls dev_pm_domain_detach() (twice),
which calls dev_pm_put_subsys_data(), and sets dev->power.subsys_data to
NULL when psd->refcount reaches zero.

Later/in parallel, default_suspend_ok() calls dev_gpd_data():

     static inline struct generic_pm_domain_data *dev_gpd_data(struct
device *dev)
     {
             return to_gpd_data(dev->power.subsys_data->domain_data);
     }

triggering the NULL pointer dereference.  Depending on timing, it may
crash earlier or later in genpd_runtime_suspend(), or not crash at all
(initially, I saw it only with extra debug prints in the genpd subsystem).

I came to the same conclusion when revisiting it yesterday and today.

The power 3dg-{a,b} domains are in RPM_SUSPENDING state, the __rpm_callback() is running and it unlocks dev->power.lock spinlock for just long enough, that the pvr_power_domains_fini() can issue dev_pm_domain_detach() and then dev_pm_put_subsys_data() , which unsets subsys_data, which are later still used by the __rpm_callback() (really the genpd_runtime_suspend() -> suspend_ok() it calls for this domain).

But, I wonder if the problem is actually in the CPG MSSR clock domain driver. The pvr_power_domains_fini() dev_pm_domain_detach() really calls cpg_mssr_detach_dev() which calls pm_clk_destroy() and that invokes the dev_pm_domain_detach() which unsets the subsys_data . The pm_clk_destroy() documentation is explicit about it unsetting the subsys_data .

I wonder if what we need to do instead, is patch the CPG MSSR clock domain driver such, that it would surely NOT call pm_clk_destroy() before the domain transitioned from RPM_SUSPENDING -> RPM_SUSPENDED state and surely is done with all its __rpm_callback() invocations ?

Can you please test this change and see if it fixes the problem ?

The barrier should guarantee that the domain is settled and no more callbacks are still running.

"
diff --git a/drivers/clk/renesas/renesas-cpg-mssr.c b/drivers/clk/renesas/renesas-cpg-mssr.c
index 7f9b7aa397906..14158cab1b129 100644
--- a/drivers/clk/renesas/renesas-cpg-mssr.c
+++ b/drivers/clk/renesas/renesas-cpg-mssr.c
@@ -24,6 +24,7 @@
 #include <linux/platform_device.h>
 #include <linux/pm_clock.h>
 #include <linux/pm_domain.h>
+#include <linux/pm_runtime.h>
 #include <linux/psci.h>
 #include <linux/reset-controller.h>
 #include <linux/slab.h>
@@ -656,8 +657,10 @@ int cpg_mssr_attach_dev(struct generic_pm_domain *unused, struct device *dev)

void cpg_mssr_detach_dev(struct generic_pm_domain *unused, struct device *dev)
 {
-       if (!pm_clk_no_clocks(dev))
+       if (!pm_clk_no_clocks(dev)) {
+               pm_runtime_barrier(dev);
                pm_clk_destroy(dev);
+       }
 }

 static void cpg_mssr_genpd_remove(void *data)
"

Reply via email to