On Tue, May 13, 2025 at 01:03:15PM +0300, Zhi Wang wrote: > On Mon, 12 May 2025 11:06:17 -0300 > Jason Gunthorpe <j...@nvidia.com> wrote: > > > On Mon, May 12, 2025 at 07:30:21PM +1000, Alexey Kardashevskiy wrote: > > > > > > > I'm surprised by this.. iommufd shouldn't be doing PCI stuff, > > > > > it is just about managing the translation control of the device. > > > > > > > > I have a little difficulty to understand. Is TSM bind PCI stuff? > > > > To me it is. Host sends PCI TDISP messages via PCI DOE to put the > > > > device in TDISP LOCKED state, so that device behaves differently > > > > from before. Then why put it in IOMMUFD? > > > > > > > > > "TSM bind" sets up the CPU side of it, it binds a VM to a piece of > > > IOMMU on the host CPU. The device does not know about the VM, it > > > just enables/disables encryption by a request from the CPU (those > > > start/stop interface commands). And IOMMUFD won't be doing DOE, the > > > platform driver (such as AMD CCP) will. Nothing to do for VFIO here. > > > > > > We probably should notify VFIO about the state transition but I do > > > not know VFIO would want to do in response. > > > > We have an awkward fit for what CCA people are doing to the various > > Linux APIs. Looking somewhat maximally across all the arches a "bind" > > for a CC vPCI device creation operation does: > > > > - Setup the CPU page tables for the VM to have access to the MMIO > > - Revoke hypervisor access to the MMIO > > - Setup the vIOMMU to understand the vPCI device > > - Take over control of some of the IOVA translation, at least for > > T=1, and route to the the vIOMMU > > - Register the vPCI with any attestation functions the VM might use > > - Do some DOE stuff to manage/validate TDSIP/etc > > > > So we have interactions of things controlled by PCI, KVM, VFIO, and > > iommufd all mushed together. > > > > iommufd is the only area that already has a handle to all the required > > objects: > > - The physical PCI function > > - The CC vIOMMU object > > - The KVM FD > > - The CC vPCI object > > > > Which is why I have been thinking it is the right place to manage > > this. > > > > It doesn't mean that iommufd is suddenly doing PCI stuff, no, that > > stays in VFIO. > > > > > > > So your issue is you need to shoot down the dmabuf during vPCI > > > > > device destruction? > > > > > > > > I assume "vPCI device" refers to assigned device in both shared > > > > mode & prvate mode. So no, I need to shoot down the dmabuf during > > > > TSM unbind, a.k.a. when assigned device is converting from > > > > private to shared. Then recover the dmabuf after TSM unbind. The > > > > device could still work in VM in shared mode. > > > > What are you trying to protect with this? Is there some intelism where > > you can't have references to encrypted MMIO pages? > > > > I think it is a matter of design choice. The encrypted MMIO page is > related to the TDI context and secure second level translation table > (S-EPT). and S-EPT is related to the confidential VM's context. > > AMD and ARM have another level of HW control, together > with a TSM-owned meta table, can simply mask out the access to those > encrypted MMIO pages. Thus, the life cycle of the encrypted mappings in > the second level translation table can be de-coupled from the TDI > unbound. They can be reaped un-harmfully later by hypervisor in another > path. > > While on Intel platform, it doesn't have that additional level of > HW control by design. Thus, the cleanup of encrypted MMIO page mapping > in the S-EPT has to be coupled tightly with TDI context destruction in > the TDI unbind process.
Thanks for the accurate explanation. Yes, in TDX, the references/mapping to the encrypted MMIO page means a CoCo-VM owns the MMIO page. So TDX firmware won't allow the CC vPCI device (which physically owns the MMIO page) unbind/freed from a CoCo-VM, while the VM still have the S-EPT mapping. AMD doesn't use KVM page table to track CC ownership, so no need to interact with KVM. Thanks, Yilun > > If the TDI unbind is triggered in VFIO/IOMMUFD, there has be a > cross-module notification to KVM to do cleanup in the S-EPT. > > So shooting down the DMABUF object (encrypted MMIO page) means shooting > down the S-EPT mapping and recovering the DMABUF object means > re-construct the non-encrypted MMIO mapping in the EPT after the TDI is > unbound. > > Z. > > > > > What I really want is, one SW component to manage MMIO dmabuf, > > > > secure iommu & TSM bind/unbind. So easier coordinate these 3 > > > > operations cause these ops are interconnected according to secure > > > > firmware's requirement. > > > > > > This SW component is QEMU. It knows about FLRs and other config > > > space things, it can destroy all these IOMMUFD objects and talk to > > > VFIO too, I've tried, so far it is looking easier to manage. Thanks, > > > > Yes, qemu should be sequencing this. The kernel only needs to enforce > > any rules required to keep the system from crashing. > > > > Jason > > >