Hi Nicolin,
On 5/19/2025 11:44 PM, Nicolin Chen wrote: > On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote: >> Jason, Nicolin, Kevin, >> >> >> On 5/15/2025 9:36 PM, Jason Gunthorpe wrote: >>> On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote: >>>> +/** >>>> + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC) >>>> + * @size: sizeof(struct iommu_hw_queue_alloc) >>>> + * @flags: Must be 0 >>>> + * @viommu_id: Virtual IOMMU ID to associate the HW queue with >>>> + * @type: One of enum iommu_hw_queue_type >>>> + * @index: The logical index to the HW queue per virtual IOMMU for a >>>> multi-queue >>>> + * model >>>> + * @out_hw_queue_id: The ID of the new HW queue >>>> + * @base_addr: Base address of the queue memory in guest physical address >>>> space >>>> + * @length: Length of the queue memory in the guest physical address space >>>> + * >>>> + * Allocate a HW queue object for a vIOMMU-specific HW-accelerated queue, >>>> which >>>> + * allows HW to access a guest queue memory described by @base_addr and >>>> @length. >>>> + * Upon success, the underlying physical pages of the guest queue memory >>>> will be >>>> + * pinned to prevent VMM from unmapping them in the IOAS until the HW >>>> queue gets >>>> + * destroyed. >>> >>> Do we have way to make the pinning optional? >>> >>> As I understand AMD's system the iommu HW itself translates the >>> base_addr through the S2 page table automatically, so it doesn't need >>> pinned memory and physical addresses but just the IOVA. >> >> Correct. HW will translate GPA -> SPA automatically using below information. >> >> AMD IOMMU need special device ID to setup with GPA -> SPA mapping per VM. >> and its programmed in VF Control BAR (VFCntlMMIO Offset {16’b[GuestID], >> 6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use this >> address for GPA to SPA translation for buffers like command buffer. >> >> So HW will use Base address (GPA), head/tail pointer to get the offset from >> Base. Then it will use GPA -> SPA translation. >> >> >>> >>> Perhaps for this reason the pinning should be done with a function >>> call from the driver? >> >> We still need to make sure memory allocated for page is present in memory so >> that IOMMU HW can access it. >> >> Pinning at the time of guest boot is enough here -OR- do we need to increase >> reference in queue_alloc() path ? > > For NVIDIA's vCMDQ that reads host PA directly, pages should be > pinned once when stage 2 mappings are created for the guest RAM, > and iommu_hw_queue_alloc() should pin the pages again to prevent > the gPA from being unmapped in the stage 2 page table. Otherwise > it will be a security hole, as HW continues to read the unmapped > memory through physical address space. > > I understand that AMD Command Buffer also needs the S2 mappings > to be present in order to work correctly. But what happens if a > queue memory that isn't pinned (or even gets unmapped)? Will it > raise a translation fault v.s. HW reading the unmapped memory? If page is unmapped then stage 2 (Host page table) gets updated. IOMMU will not be able to find page and logs fault. > > If so, I think this is Jason's point: there would be unlikely a > security hole, i.e. for AMD, iommu_hw_queue_alloc() pinning the > physical pages is likely optional. I think so. -Vasant