On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote: > Jason, Nicolin, Kevin, > > > On 5/15/2025 9:36 PM, Jason Gunthorpe wrote: > > On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote: > >> +/** > >> + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC) > >> + * @size: sizeof(struct iommu_hw_queue_alloc) > >> + * @flags: Must be 0 > >> + * @viommu_id: Virtual IOMMU ID to associate the HW queue with > >> + * @type: One of enum iommu_hw_queue_type > >> + * @index: The logical index to the HW queue per virtual IOMMU for a > >> multi-queue > >> + * model > >> + * @out_hw_queue_id: The ID of the new HW queue > >> + * @base_addr: Base address of the queue memory in guest physical address > >> space > >> + * @length: Length of the queue memory in the guest physical address space > >> + * > >> + * Allocate a HW queue object for a vIOMMU-specific HW-accelerated queue, > >> which > >> + * allows HW to access a guest queue memory described by @base_addr and > >> @length. > >> + * Upon success, the underlying physical pages of the guest queue memory > >> will be > >> + * pinned to prevent VMM from unmapping them in the IOAS until the HW > >> queue gets > >> + * destroyed. > > > > Do we have way to make the pinning optional? > > > > As I understand AMD's system the iommu HW itself translates the > > base_addr through the S2 page table automatically, so it doesn't need > > pinned memory and physical addresses but just the IOVA. > > Correct. HW will translate GPA -> SPA automatically using below information. > > AMD IOMMU need special device ID to setup with GPA -> SPA mapping per VM. > and its programmed in VF Control BAR (VFCntlMMIO Offset {16’b[GuestID], > 6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use this > address for GPA to SPA translation for buffers like command buffer. > > So HW will use Base address (GPA), head/tail pointer to get the offset from > Base. Then it will use GPA -> SPA translation. > > > > > > Perhaps for this reason the pinning should be done with a function > > call from the driver? > > We still need to make sure memory allocated for page is present in memory so > that IOMMU HW can access it. > > Pinning at the time of guest boot is enough here -OR- do we need to increase > reference in queue_alloc() path ?
For NVIDIA's vCMDQ that reads host PA directly, pages should be pinned once when stage 2 mappings are created for the guest RAM, and iommu_hw_queue_alloc() should pin the pages again to prevent the gPA from being unmapped in the stage 2 page table. Otherwise it will be a security hole, as HW continues to read the unmapped memory through physical address space. I understand that AMD Command Buffer also needs the S2 mappings to be present in order to work correctly. But what happens if a queue memory that isn't pinned (or even gets unmapped)? Will it raise a translation fault v.s. HW reading the unmapped memory? If so, I think this is Jason's point: there would be unlikely a security hole, i.e. for AMD, iommu_hw_queue_alloc() pinning the physical pages is likely optional. Thanks Nicolin