Hello Chaoyi & Diederik,
I compared the RK3568 and RK3588 NPU power-domain + DTS as you
suggested, and it lines up
exactly with what you described.
The difference is the `need_regulator` capability. RK3588's NPU domain is
`DOMAIN_RK3588("npu", …, false, true)` — the trailing `true` is
`regulator`/`need_regulator`.
The mainline RK3568 macro `DOMAIN_RK3568(name, pwr, req, wakeup)` has
no regulator parameter at
all, so `RK3568_PD_NPU` can't be marked need_regulator. My v4 adds
that: a regulator-capable
RK3568 NPU domain (need_regulator = true) plus `domain-supply =
<&vdd_npu>` on the NPU node —
i.e. the same shape as RK3588.
And the fix you referenced (Frank Zhang's "pmdomain: rockchip: Fix init genpd as
GENPD_STATE_ON before regulator ready", plus "quiet regulator error on
-EPROBE_DEFER") is
already in my base (v7.1-rc6), so the `if (need_regulator)
rockchip_pd_power(pd, false)`
default-off path is in effect. That's what resolves the actual problem
for me: with rocket
built as a module (the normal config), need_regulator on the NPU
domain, and those pmdomain
patches in place, the board boots cleanly and NPU jobs run with no RCU
stall / no deadlock. My
earlier hang was an artifact of a self-contained rocket=y image
probing in the initcalls before
the I2C regulator core was up — as a module it loads ~6.8 s in, well
after, so it's gone.
I also went back and checked the `fw_devlink=permissive` question
myself — and good news, it
turns out it is NOT needed. I rebooted the exact same kernel with
permissive removed from the
cmdline (strict fw_devlink, the default), and the board boots cleanly,
the NPU probes
(`rocket fde40000.npu: Rockchip NPU core 0 version: 0`), and NPU jobs
submit and run five times
in a row with no deadlock and no RCU stall. So strict fw_devlink
resolves the NPU/PMIC ordering
fine via deferred probe.
The one remaining thing is cosmetic: at power-domain-controller probe
(~2.94 s) I still get,
in BOTH modes (with or without permissive):
rockchip-pm-domain …: Failed to create device link (0x180) with
supplier 0-0020 …power-domain@6
i.e. genpd can't form the link to the rk809 (the I2C PMIC supplying
vdd_npu) because the PMIC
isn't registered yet at that point. It's non-fatal — the domain
defaults off (Frank's patch),
the rail comes up via the regulator core, the NPU probes a few seconds
later, and all jobs run.
One question: on RK3588 with need_regulator, do you also see that
"Failed to create device
link … supplier <pmic>" line at pmdomain probe, or does it order
cleanly? If RK3588 is clean,
is there a DTS detail (e.g. the regulator's bus/probe order) I should
mirror on RK3568 to make
the link form in time — or is this line just expected/harmless and
best left as-is?
@Diederik — thanks; the DCDC_REG2 change and Jonas's USB-suspend
series look like generally
useful RK356x robustness fixes, though for this specific NPU
device-link the need_regulator +
Frank's pmdomain patches seem to be the relevant piece. I'll keep them
in mind for suspend.
The convolution-output / compute-completion issue is still separate
and open (@Finley — that's
the PVTPLL/NoC one); the power-domain side is in good shape for v4.
Thanks y'all for your help :)
Kind regards,
Midgy
Le mer. 10 juin 2026 à 12:05, Diederik de Haas
<[email protected]> a écrit :
>
> Hi,
>
> On Wed Jun 10, 2026 at 3:14 AM CEST, Chaoyi Chen wrote:
> > Hi Midgy,
> >
> > On 6/9/2026 7:11 PM, Midgy Balon wrote:
> >> Hello Chaoyi,
> >>
> >> You were right - building rocket as a module fixes it. Thanks for the
> >> pointer.
> >>
> >> I rebuilt with CONFIG_DRM_ACCEL_ROCKET=m (everything else the same:
> >> need_regulator on
> >> the RK3568 NPU power domain via a DOMAIN_M_R variant, domain-supply =
> >> <&vdd_npu>, and the
> >> regulator-always-on workaround dropped). The board now boots cleanly
> >> and, more importantly,
> >> an NPU job submit no longer hangs: I ran the test workload five times
> >> with no RCU stall and
> >> no freeze.
> >>
> >> So with rocket=m the need_regulator approach works on RK3568, and I'll
> >> keep it for v4
> >> (domain-supply + need_regulator, instead of marking vdd_npu
> >> always-on). rocket=m is the
> >> normal configuration anyway; my earlier hang came from building it =y
> >> in a self-contained
> >> image, so it probed in the initcalls (around 2 s) and the genpd ->
> >> I2C-PMIC regulator
> >> transition ran before the system was ready. As a module it loads from
> >> udev much later
> >> (~6.8 s here), after the I2C controller and regulator core are fully up.
> >>
> >> On your question of when the device-link error is printed - it is at
> >> power-domain
> >> controller probe, not at the rocket probe:
> >>
> >> [ 2.700618] vdd_npu: Bringing 500000uV into 825000-825000uV
> >> [ 2.749637] rockchip-pm-domain
> >> fdd90000.power-management:power-controller:
> >> Failed to create device link (0x180) with supplier 0-0020
> >> for
> >> /power-management@fdd90000/power-controller/power-domain@6
> >> [ 2.945955] platform fde40000.npu: Adding to iommu group 3
> >> ...
> >> [ 6.840374] rocket: loading out-of-tree module taints kernel.
> >> [ 6.877647] [drm] Initialized rocket 0.0.0 for rknn on minor 0
> >> [ 6.879950] rocket fde40000.npu: Rockchip NPU core 0 version: 0
> >>
> >> So the device-link to the rk809 PMIC (0-0020) fails to form at ~2.75
> >> s, well before rocket
> >> loads at ~6.8 s. It is non-fatal here - the vdd_npu rail is brought up
> >> by the regulator core
> >> and all jobs run - and there is no "failed to get ack on domain npu"
> >> NoC warning this boot
> >> (the always-on kernel had one). The complete boot log is attached.
> >>
> >> Two notes / one question:
> >> - This boot used fw_devlink=permissive on the command line. Is the
> >> "Failed to create device
> >> link ... supplier 0-0020" at pmdomain probe expected/benign, or is
> >> there a clean way to make
> >> it order correctly (so it also works without permissive, and a =y
> >> build wouldn't deadlock in
> >> the initcalls)?
> >
> > We encountered the same issue on the RK3588 NPU before. And it was
> > resolved with the following patch at that time.
> >
> > https://lore.kernel.org/all/[email protected]/
> >
> > Please compare the differences in NPU pmdomain and DTS configuration
> > between the RK3568 and RK3588.
>
> About a month ago on #linux-rockchip we were discussing PM 'stuff':
> https://libera.catirclogs.org/linux-rockchip/2026-05-15#39939137;
> which references this paste
> https://paste.sr.ht/~diederik/89d9f84e22474e837b55286d213b67f03859ce2e
> I've since removed the DCDC_REG2 for PineTab2 and the 'fix' should likely
> be extended to cover all RK3566/RK3568 devices though.
>
> It's what I made at the time hoping to fix a suspend/resume issue when
> trying upstream TF-A. It didn't fix the issue at the time, but may still
> be useful/needed and I think it's what Chaoyi hinted at.
>
> Just yesterday, Jonas posted this patch which may be useful/needed too:
> https://lore.kernel.org/linux-rockchip/[email protected]/
>
> HTH,
> Diederik
>
> >> - (The convolution output is still uniform zero-point / the job times
> >> out - that is the
> >> separate NPU compute-completion issue, unrelated to the power-domain
> >> work. Finley, that is
> >> the one I flagged earlier re PVTPLL/NoC.)
> >>
> >> Kind regards,
> >> Midgy
> >>
>