Hello Chaoyi, Thanks -- this is exactly what I needed.
- v2/DTE: will do. I'll keep building on Simon's per-device-ops series -- with that in place the NPU MMU can use the 32-bit-DTE ops (the per-ops GFP_DMA32 that's already in mainline) without the global rk_ops conflict. I'll keep it as a stated dependency of the v4 cover letter. - vdd_npu: I'll switch the RK3568 NPU power domain to need_regulator + domain-supply = <&vdd_npu> and drop the regulator-always-on workaround. I suspect that's also the right fix for the power-off/on de-idle issue I described -- the always-on was really just papering over the domain not being modelled with a regulator. I'll confirm on the board. - AUTO_GATING: thanks for the commit references -- I'll keep the bit-31 read-modify-write form with your Suggested-by and write the comment from those. For the record: on v7.1-rc6 the NPU MMU also completes translations on the reset value (I couldn't reproduce a page-walk stall without the write), so I'll note in the commit that it matches the vendor clock-gating handling rather than fixing a failure I can reproduce here -- happy to drop it if the iommu maintainers would prefer. - PVTPLL/NoC: I'll follow up with Finley. First I'll check whether the need_regulator change resolves the NoC re-power de-idle on its own; if it still I'll bring him the details (the genpd power-on de-idle ack and the BUS_IDLE_ST state). I'll send a v4 with these. Thanks again for the quick, detailed answers. Kind regards, Midgy Le lun. 8 juin 2026 à 03:40, Chaoyi Chen <[email protected]> a écrit : > > Hi Midgy, > > On 6/8/2026 5:03 AM, Midgy Balon wrote: > > Hi Chaoyi, > > > > Thanks a lot for looking at this -- input from Rockchip is exactly what this > > series needs. > > > >> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than v1, > >> implying it should support 40-bit PAs. Nevertheless, please note that the > >> upper limit for DTE is 32 bits. > > > > Understood, and that 32-bit-DTE note is the crux of the trouble I had, so > > let > > me lay out what I see and ask how you'd prefer to solve it. > > > > The mainline node is already v2 (rockchip,rk3568-iommu in rk356x-base.dtsi). > > The problem on this 8 GiB board: with the v2 ops the page-table allocations > > (gfp_flags == 0) can land above 4 GiB, so the DTE ends up > 32 bits and the > > NPU's first translation faults with DMA_READ_ERROR. To work around that I > > had > > switched the NPU MMU to the v1 compatible (rockchip,iommu), whose ops set > > GFP_DMA32 and keep the DTE sub-4 GiB. That works in isolation, but because > > the > > driver keeps a single global rk_ops, a v1 NPU MMU then trips > > WARN_ON(rk_ops != ops) against the SoC's v2 instances (VOP/VDEC), which is > > why > > I based the series on Simon's per-device-ops work. > > > > So my question: with per-device ops in place, what's the intended way to > > keep > > the NPU MMU on v2 *and* cap its DTE at 32 bits on boards with >4 GiB of RAM? > > A v2 ops variant carrying GFP_DMA32 for this device, or is there a register/ > > config bit that constrains the DTE address? I'd rather follow the Rockchip > > intent here than carry the v1 workaround. (Simon, cc'd -- this is right > > next to > > your per-device-ops series.) > > > > If Simon's method works, please use it :) > > >> Can these operations not be completed via the pmdomain driver? > >> If some operations are controlled by TF-A, are you using open source TF-A? > > > > Most of it is in pmdomain already. Power-on and NoC de-idle are done by the > > RK3568 NPU power domain (genpd) at power-on -- the driver no longer pokes > > the > > PMU directly. Two things remain outside it: > > > > - vdd_npu: I mark it regulator-always-on in DT rather than wiring it as the > > domain's domain-supply, because as a domain-supply it created a > > device-link > > to the I2C PMIC (rk809) and genpd's power-off QoS-save path then hung > > reading the NPU QoS registers behind the (gated) NoC. If there's a clean > > way > > to let genpd own vdd_npu without that I2C ordering deadlock I'd much > > prefer > > that -- pointers welcome. > > > > Please refer to the patch below regarding the RK3588 NPU pmdomain. > In short, you need to set a "need_regulator" for the RK3568 NPU pmdomain. > > https://lore.kernel.org/all/[email protected]/ > > > - the NPU compute clock (PVTPLL): set from the driver via SCMI, and only > > needed for actual compute, not for bring-up. > > > > One more pmdomain observation from testing, possibly relevant to how the NPU > > domain should be modelled: the domain's power-off/on cycle doesn't reliably > > re-de-idle the NoC. If the NPU is probed after genpd has already powered the > > (unused) domain off, the power-on de-idle fails ("failed to set idle on > > domain > > 'npu'") and the NPU IOMMU then takes an external abort on its first MMIO > > access. > > Probing the NPU before the unused-domain power-off, or marking the domain > > always-on, both avoid it. Is the NoC de-idle expected to work on a genpd > > re-power here, or should this domain effectively stay on? > > > > Not quite sure what's going on with PVTPLL and NOC. > Maybe @Finley knows about this? > > > On TF-A: yes -- bl31 is built from upstream arm-trusted-firmware > > (github.com/ARM-software/arm-trusted-firmware, RK3568 platform), providing > > PSCI > > and the SCMI clock service. The only closed blob in the boot chain is > > Rockchip's > > DDR init (rkbin), which is the standard situation for mainline RK356x. > > -- > Best, > Chaoyi
