Hi Midgy, On 6/8/2026 5:03 AM, Midgy Balon wrote: > Hi Chaoyi, > > Thanks a lot for looking at this -- input from Rockchip is exactly what this > series needs. > >> Hmmm. If I understand correctly, the NPU IOMMU should be v2 rather than v1, >> implying it should support 40-bit PAs. Nevertheless, please note that the >> upper limit for DTE is 32 bits. > > Understood, and that 32-bit-DTE note is the crux of the trouble I had, so let > me lay out what I see and ask how you'd prefer to solve it. > > The mainline node is already v2 (rockchip,rk3568-iommu in rk356x-base.dtsi). > The problem on this 8 GiB board: with the v2 ops the page-table allocations > (gfp_flags == 0) can land above 4 GiB, so the DTE ends up > 32 bits and the > NPU's first translation faults with DMA_READ_ERROR. To work around that I had > switched the NPU MMU to the v1 compatible (rockchip,iommu), whose ops set > GFP_DMA32 and keep the DTE sub-4 GiB. That works in isolation, but because the > driver keeps a single global rk_ops, a v1 NPU MMU then trips > WARN_ON(rk_ops != ops) against the SoC's v2 instances (VOP/VDEC), which is why > I based the series on Simon's per-device-ops work. > > So my question: with per-device ops in place, what's the intended way to keep > the NPU MMU on v2 *and* cap its DTE at 32 bits on boards with >4 GiB of RAM? > A v2 ops variant carrying GFP_DMA32 for this device, or is there a register/ > config bit that constrains the DTE address? I'd rather follow the Rockchip > intent here than carry the v1 workaround. (Simon, cc'd -- this is right next > to > your per-device-ops series.) >
If Simon's method works, please use it :) >> Can these operations not be completed via the pmdomain driver? >> If some operations are controlled by TF-A, are you using open source TF-A? > > Most of it is in pmdomain already. Power-on and NoC de-idle are done by the > RK3568 NPU power domain (genpd) at power-on -- the driver no longer pokes the > PMU directly. Two things remain outside it: > > - vdd_npu: I mark it regulator-always-on in DT rather than wiring it as the > domain's domain-supply, because as a domain-supply it created a device-link > to the I2C PMIC (rk809) and genpd's power-off QoS-save path then hung > reading the NPU QoS registers behind the (gated) NoC. If there's a clean > way > to let genpd own vdd_npu without that I2C ordering deadlock I'd much prefer > that -- pointers welcome. > Please refer to the patch below regarding the RK3588 NPU pmdomain. In short, you need to set a "need_regulator" for the RK3568 NPU pmdomain. https://lore.kernel.org/all/[email protected]/ > - the NPU compute clock (PVTPLL): set from the driver via SCMI, and only > needed for actual compute, not for bring-up. > > One more pmdomain observation from testing, possibly relevant to how the NPU > domain should be modelled: the domain's power-off/on cycle doesn't reliably > re-de-idle the NoC. If the NPU is probed after genpd has already powered the > (unused) domain off, the power-on de-idle fails ("failed to set idle on domain > 'npu'") and the NPU IOMMU then takes an external abort on its first MMIO > access. > Probing the NPU before the unused-domain power-off, or marking the domain > always-on, both avoid it. Is the NoC de-idle expected to work on a genpd > re-power here, or should this domain effectively stay on? > Not quite sure what's going on with PVTPLL and NOC. Maybe @Finley knows about this? > On TF-A: yes -- bl31 is built from upstream arm-trusted-firmware > (github.com/ARM-software/arm-trusted-firmware, RK3568 platform), providing > PSCI > and the SCMI clock service. The only closed blob in the boot chain is > Rockchip's > DDR init (rkbin), which is the standard situation for mainline RK356x. -- Best, Chaoyi
