On 28/1/26 01:25, Jason Gunthorpe wrote:
On Tue, Jan 27, 2026 at 07:08:39PM +1100, Alexey Kardashevskiy wrote:
Oh so it doesn't actually check the RMP, it is just rounding down to
two fixed sizes?
No, it does check RMP.
If the IOMMU page walk ends at a >=2MB page - it will round down to
2MB (to the nearest supported RMP size) and check for 2MB RMP and if
that check fails because of the page size - it won't try 4K (even
though it could theoretically).
The expectation is that the host OS makes sure the IOMMU uses page
sizes equal or bigger than closest smaller RMP page size so there is
no need in two RMP checks.
Seems dynfunctional to me.
ARM is pushing a thing where encrypt/decrypt has to work on certain aligned
granual sizes > PAGE_SIZE, you could use that mechanism to select a 2M
size for AMD too and avoid this.
2M minimum on every DMA map?
On every swiotlb allocation pool chunk, yeah.
Nah, it is quite easy to force 2MB on swiotlb (just do it once and
forget) but currently any guest page can be converted to shared and
DMA-mapped and this skips swiotlb.
Upstream Linux doesn't support that, only SWIOTLB or special DMA
coherent memory can be DMA mapped in CC systems. You can't take a
random page, make it shared and then DMA map it.
Well, my test device driver calls dma_alloc_coherent() which does that - alloc
+ convert 4K.
What happens if the guest puts 4K pages into it's AMDv2 table and RMP
is 2M?
Is this AMDv2 - an NPT (then it is going to fail)? or nested IOMMU (never
tried, in the works, I suspect failure)?
Yes, some future nested vIOMMU
If guest can't have a 4K page in it's vIOMMU while the host is using
2M RMP then the whole architecture is broken, sorry.
I re-read what I wrote and I think I was wrong, the S2 table (guest
physical -> host physical) has to match RMP, not the S1.
Really? So the HW can fix the 4k/2M mismatch for the S1 but doesn't
bother for the S2? Seems like a crazy design to me.
S2 is controlled by the HV and RMP protects against wrong mapping of the guest
physical memory in that S2, RMP is a HW firewall.
What happens if you don't have a VIOMMU, have a single translation
stage and only use the S1 (AMDv2) page table in the hypervisor? Then
does the HW fix it? Or does it only fix it with two stages enabled?
The HW translates a DMA handle to a host pfn, and then RMP checks if that
[pfn..pfn+size] is assigned to the correct ASID and the page size matches and
the gfn matches.
RMP does not check S1 translations inside the guest, only S2. RMP is not fixing
page sizes or anything, it says yes/no to the access.
iommufd won't deal with memory maps for IO, the secure world will
handle that through KVM.
Is QEMU going to skip on IOMMU mapping entirely? So when the device
is transitioned from untrusted (when everything mapped via VFIO or
IOMMU) to trusted - QEMU will unmap everything and then the guest
will map everything but this time via KVM and bypassing QEMU
entirely? Thanks,
On ARM there are different S2s for the IOMMU, one for T=1 and one for
T=0 traffic. The T=1 is fully controlled by the secure world is equal
to the CPU S2. The T=0 one is fully controlled by qemu and acts like a
normal system. The T=0 can only access guest shared memory.
Does the T=0 table still have all the guest memory mapped (with the expectation
that what is not allowed - won't be accessed using that table)? Thanks,
Jason
--
Alexey