Hello Chaoyi,

Thanks -- this is exactly what I needed.

- v2/DTE: will do. I'll keep building on Simon's per-device-ops series -- with
  that in place the NPU MMU can use the 32-bit-DTE ops (the per-ops GFP_DMA32
  that's already in mainline) without the global rk_ops conflict. I'll
keep it as
  a stated dependency of the v4 cover letter.

- vdd_npu:  I'll switch the RK3568 NPU
  power domain to need_regulator + domain-supply = <&vdd_npu> and drop the
  regulator-always-on workaround. I suspect that's also the right fix for the
  power-off/on de-idle issue I described -- the always-on was really
just papering
  over the domain not being modelled with a regulator. I'll confirm on
the board.

- AUTO_GATING: thanks for the commit references -- I'll keep the bit-31
  read-modify-write form with your Suggested-by and write the comment
from those.
  For the record: on v7.1-rc6 the NPU MMU also completes translations
on the reset
  value (I couldn't reproduce a page-walk stall without the write), so I'll note
  in the commit that it matches the vendor clock-gating handling rather than
  fixing a failure I can reproduce here -- happy to drop it if the iommu
  maintainers would prefer.

- PVTPLL/NoC: I'll follow up with Finley. First I'll check whether the
  need_regulator change resolves the NoC re-power de-idle on its own;
if it still
  I'll bring him the details (the genpd power-on de-idle ack and the
  BUS_IDLE_ST state).

I'll send a v4 with these. Thanks again for the quick, detailed answers.

Kind regards,
Midgy

Le lun. 8 juin 2026 à 03:40, Chaoyi Chen <[email protected]> a écrit :
>
> Hi Midgy,
>
> On 6/8/2026 5:03 AM, Midgy Balon wrote:
> > Hi Chaoyi,
> >
> > Thanks a lot for looking at this -- input from Rockchip is exactly what this
> > series needs.
> >
> >> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than v1,
> >> implying it should support 40-bit PAs. Nevertheless, please note that the
> >> upper limit for DTE is 32 bits.
> >
> > Understood, and that 32-bit-DTE note is the crux of the trouble I had, so 
> > let
> > me lay out what I see and ask how you'd prefer to solve it.
> >
> > The mainline node is already v2 (rockchip,rk3568-iommu in rk356x-base.dtsi).
> > The problem on this 8 GiB board: with the v2 ops the page-table allocations
> > (gfp_flags == 0) can land above 4 GiB, so the DTE ends up > 32 bits and the
> > NPU's first translation faults with DMA_READ_ERROR. To work around that I 
> > had
> > switched the NPU MMU to the v1 compatible (rockchip,iommu), whose ops set
> > GFP_DMA32 and keep the DTE sub-4 GiB. That works in isolation, but because 
> > the
> > driver keeps a single global rk_ops, a v1 NPU MMU then trips
> > WARN_ON(rk_ops != ops) against the SoC's v2 instances (VOP/VDEC), which is 
> > why
> > I based the series on Simon's per-device-ops work.
> >
> > So my question: with per-device ops in place, what's the intended way to 
> > keep
> > the NPU MMU on v2 *and* cap its DTE at 32 bits on boards with >4 GiB of RAM?
> > A v2 ops variant carrying GFP_DMA32 for this device, or is there a register/
> > config bit that constrains the DTE address? I'd rather follow the Rockchip
> > intent here than carry the v1 workaround. (Simon, cc'd -- this is right 
> > next to
> > your per-device-ops series.)
> >
>
> If Simon's method works, please use it :)
>
> >> Can these operations not be completed via the pmdomain driver?
> >> If some operations are controlled by TF-A, are you using open source TF-A?
> >
> > Most of it is in pmdomain already. Power-on and NoC de-idle are done by the
> > RK3568 NPU power domain (genpd) at power-on -- the driver no longer pokes 
> > the
> > PMU directly. Two things remain outside it:
> >
> >  - vdd_npu: I mark it regulator-always-on in DT rather than wiring it as the
> >    domain's domain-supply, because as a domain-supply it created a 
> > device-link
> >    to the I2C PMIC (rk809) and genpd's power-off QoS-save path then hung
> >    reading the NPU QoS registers behind the (gated) NoC. If there's a clean 
> > way
> >    to let genpd own vdd_npu without that I2C ordering deadlock I'd much 
> > prefer
> >    that -- pointers welcome.
> >
>
> Please refer to the patch below regarding the RK3588 NPU pmdomain.
> In short, you need to set a "need_regulator" for the RK3568 NPU pmdomain.
>
> https://lore.kernel.org/all/[email protected]/
>
> >  - the NPU compute clock (PVTPLL): set from the driver via SCMI, and only
> >    needed for actual compute, not for bring-up.
> >
> > One more pmdomain observation from testing, possibly relevant to how the NPU
> > domain should be modelled: the domain's power-off/on cycle doesn't reliably
> > re-de-idle the NoC. If the NPU is probed after genpd has already powered the
> > (unused) domain off, the power-on de-idle fails ("failed to set idle on 
> > domain
> > 'npu'") and the NPU IOMMU then takes an external abort on its first MMIO 
> > access.
> > Probing the NPU before the unused-domain power-off, or marking the domain
> > always-on, both avoid it. Is the NoC de-idle expected to work on a genpd
> > re-power here, or should this domain effectively stay on?
> >
>
> Not quite sure what's going on with PVTPLL and NOC.
> Maybe @Finley knows about this?
>
> > On TF-A: yes -- bl31 is built from upstream arm-trusted-firmware
> > (github.com/ARM-software/arm-trusted-firmware, RK3568 platform), providing 
> > PSCI
> > and the SCMI clock service. The only closed blob in the boot chain is 
> > Rockchip's
> > DDR init (rkbin), which is the standard situation for mainline RK356x.
>
> --
> Best,
> Chaoyi

Reply via email to