On Wed Jun 10, 2026 at 3:36 PM CEST, Midgy Balon wrote:
> Hello Chaoyi & Diederik,
>
> I compared the RK3568 and RK3588 NPU power-domain + DTS as you
> suggested, and it lines up
> exactly with what you described.
>
> The difference is the `need_regulator` capability. RK3588's NPU domain is
> `DOMAIN_RK3588("npu", …, false, true)` — the trailing `true` is
> `regulator`/`need_regulator`.
> The mainline RK3568 macro `DOMAIN_RK3568(name, pwr, req, wakeup)` has
> no regulator parameter at
> all, so `RK3568_PD_NPU` can't be marked need_regulator. My v4 adds
> that: a regulator-capable
> RK3568 NPU domain (need_regulator = true) plus `domain-supply =
> <&vdd_npu>` on the NPU node —
> i.e. the same shape as RK3588.
>
> And the fix you referenced (Frank Zhang's "pmdomain: rockchip: Fix init genpd 
> as
> GENPD_STATE_ON before regulator ready", plus "quiet regulator error on
> -EPROBE_DEFER") is
> already in my base (v7.1-rc6), so the `if (need_regulator)
> rockchip_pd_power(pd, false)`
> default-off path is in effect. That's what resolves the actual problem
> for me: with rocket
> built as a module (the normal config), need_regulator on the NPU
> domain, and those pmdomain
> patches in place, the board boots cleanly and NPU jobs run with no RCU
> stall / no deadlock. My
> earlier hang was an artifact of a self-contained rocket=y image
> probing in the initcalls before
> the I2C regulator core was up — as a module it loads ~6.8 s in, well
> after, so it's gone.
>
> I also went back and checked the `fw_devlink=permissive` question
> myself — and good news, it
> turns out it is NOT needed. I rebooted the exact same kernel with
> permissive removed from the
> cmdline (strict fw_devlink, the default), and the board boots cleanly,
> the NPU probes
> (`rocket fde40000.npu: Rockchip NPU core 0 version: 0`), and NPU jobs
> submit and run five times
> in a row with no deadlock and no RCU stall. So strict fw_devlink
> resolves the NPU/PMIC ordering
> fine via deferred probe.
>
> The one remaining thing is cosmetic: at power-domain-controller probe
> (~2.94 s) I still get,
> in BOTH modes (with or without permissive):
>
>   rockchip-pm-domain …: Failed to create device link (0x180) with
> supplier 0-0020 …power-domain@6
>
> i.e. genpd can't form the link to the rk809 (the I2C PMIC supplying
> vdd_npu) because the PMIC
> isn't registered yet at that point. It's non-fatal — the domain
> defaults off (Frank's patch),
> the rail comes up via the regulator core, the NPU probes a few seconds
> later, and all jobs run.
>
> One question: on RK3588 with need_regulator, do you also see that
> "Failed to create device
> link … supplier <pmic>" line at pmdomain probe, or does it order
> cleanly? If RK3588 is clean,
> is there a DTS detail (e.g. the regulator's bus/probe order) I should
> mirror on RK3568 to make
> the link form in time — or is this line just expected/harmless and
> best left as-is?

[    2.110935] rockchip-pm-domain fd8d8000.power-management:power-controller: 
Failed to create device link (0x180) with supplier 2-0042 for 
/power-management@fd8d8000/power-controller/power-domain@8
[    2.557459] sdhci-dwcmshc fe2e0000.mmc: Can't reduce the clock below 52MHz 
in HS200/HS400 mode
[    2.647174] rockchip-pm-domain fd8d8000.power-management:power-controller: 
Failed to create device link (0x180) with supplier 2-0042 for 
/power-management@fd8d8000/power-controller/power-domain@8
[    2.945089] rockchip-pm-domain fd8d8000.power-management:power-controller: 
Failed to create device link (0x180) with supplier spi2.0 for 
/power-management@fd8d8000/power-controller/power-domain@12

8 = NPU; 12 = GPU

on both nanopc-t6-lts and nanopc-t6-plus (both RK3588).
And on a 6.18 dmesg output I have for Rock 5B, I see the ~ same, but then
it's 1-0042 instead of 2-0042. 

I don't know if it's bad or harmless, but it is consistent.

HTH,
  Diederik

> @Diederik — thanks; the DCDC_REG2 change and Jonas's USB-suspend
> series look like generally
> useful RK356x robustness fixes, though for this specific NPU
> device-link the need_regulator +
> Frank's pmdomain patches seem to be the relevant piece. I'll keep them
> in mind for suspend.
>
> The convolution-output / compute-completion issue is still separate
> and open (@Finley — that's
> the PVTPLL/NoC one); the power-domain side is in good shape for v4.
>
> Thanks y'all for your help :)
>
> Kind regards,
> Midgy
>
> Le mer. 10 juin 2026 à 12:05, Diederik de Haas
> <[email protected]> a écrit :
>>
>> Hi,
>>
>> On Wed Jun 10, 2026 at 3:14 AM CEST, Chaoyi Chen wrote:
>> > Hi Midgy,
>> >
>> > On 6/9/2026 7:11 PM, Midgy Balon wrote:
>> >> Hello Chaoyi,
>> >>
>> >> You were right - building rocket as a module fixes it. Thanks for the 
>> >> pointer.
>> >>
>> >> I rebuilt with CONFIG_DRM_ACCEL_ROCKET=m (everything else the same:
>> >> need_regulator on
>> >> the RK3568 NPU power domain via a DOMAIN_M_R variant, domain-supply =
>> >> <&vdd_npu>, and the
>> >> regulator-always-on workaround dropped). The board now boots cleanly
>> >> and, more importantly,
>> >> an NPU job submit no longer hangs: I ran the test workload five times
>> >> with no RCU stall and
>> >> no freeze.
>> >>
>> >> So with rocket=m the need_regulator approach works on RK3568, and I'll
>> >> keep it for v4
>> >> (domain-supply + need_regulator, instead of marking vdd_npu
>> >> always-on). rocket=m is the
>> >> normal configuration anyway; my earlier hang came from building it =y
>> >> in a self-contained
>> >> image, so it probed in the initcalls (around 2 s) and the genpd ->
>> >> I2C-PMIC regulator
>> >> transition ran before the system was ready. As a module it loads from
>> >> udev much later
>> >> (~6.8 s here), after the I2C controller and regulator core are fully up.
>> >>
>> >> On your question of when the device-link error is printed - it is at
>> >> power-domain
>> >> controller probe, not at the rocket probe:
>> >>
>> >>   [    2.700618] vdd_npu: Bringing 500000uV into 825000-825000uV
>> >>   [    2.749637] rockchip-pm-domain 
>> >> fdd90000.power-management:power-controller:
>> >>                  Failed to create device link (0x180) with supplier 
>> >> 0-0020 for
>> >>                  
>> >> /power-management@fdd90000/power-controller/power-domain@6
>> >>   [    2.945955] platform fde40000.npu: Adding to iommu group 3
>> >>   ...
>> >>   [    6.840374] rocket: loading out-of-tree module taints kernel.
>> >>   [    6.877647] [drm] Initialized rocket 0.0.0 for rknn on minor 0
>> >>   [    6.879950] rocket fde40000.npu: Rockchip NPU core 0 version: 0
>> >>
>> >> So the device-link to the rk809 PMIC (0-0020) fails to form at ~2.75
>> >> s, well before rocket
>> >> loads at ~6.8 s. It is non-fatal here - the vdd_npu rail is brought up
>> >> by the regulator core
>> >> and all jobs run - and there is no "failed to get ack on domain npu"
>> >> NoC warning this boot
>> >> (the always-on kernel had one). The complete boot log is attached.
>> >>
>> >> Two notes / one question:
>> >> - This boot used fw_devlink=permissive on the command line. Is the
>> >> "Failed to create device
>> >>   link ... supplier 0-0020" at pmdomain probe expected/benign, or is
>> >> there a clean way to make
>> >>   it order correctly (so it also works without permissive, and a =y
>> >> build wouldn't deadlock in
>> >>   the initcalls)?
>> >
>> > We encountered the same issue on the RK3588 NPU before. And it was
>> > resolved with the following patch at that time.
>> >
>> > https://lore.kernel.org/all/[email protected]/
>> >
>> > Please compare the differences in NPU pmdomain and DTS configuration
>> > between the RK3568 and RK3588.
>>
>> About a month ago on #linux-rockchip we were discussing PM 'stuff':
>> https://libera.catirclogs.org/linux-rockchip/2026-05-15#39939137;
>> which references this paste
>> https://paste.sr.ht/~diederik/89d9f84e22474e837b55286d213b67f03859ce2e
>> I've since removed the DCDC_REG2 for PineTab2 and the 'fix' should likely
>> be extended to cover all RK3566/RK3568 devices though.
>>
>> It's what I made at the time hoping to fix a suspend/resume issue when
>> trying upstream TF-A. It didn't fix the issue at the time, but may still
>> be useful/needed and I think it's what Chaoyi hinted at.
>>
>> Just yesterday, Jonas posted this patch which may be useful/needed too:
>> https://lore.kernel.org/linux-rockchip/[email protected]/
>>
>> HTH,
>>   Diederik
>>
>> >> - (The convolution output is still uniform zero-point / the job times
>> >> out - that is the
>> >>   separate NPU compute-completion issue, unrelated to the power-domain
>> >> work. Finley, that is
>> >>   the one I flagged earlier re PVTPLL/NoC.)
>> >>
>> >> Kind regards,
>> >> Midgy
>> >>
>>
>
> _______________________________________________
> Linux-rockchip mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

Reply via email to