Thanks Alex, Baolu. The patch fixes things at my end. No kernel-flooding is seen now (tested starting/stopping vm > 10 times).
Thanks and Regards, Ajay On Thu, Nov 11, 2021 at 6:03 AM Alex Williamson <alex.william...@redhat.com> wrote: > > When supporting only the .map and .unmap callbacks of iommu_ops, > the IOMMU driver can make assumptions about the size and alignment > used for mappings based on the driver provided pgsize_bitmap. VT-d > previously used essentially PAGE_MASK for this bitmap as any power > of two mapping was acceptably filled by native page sizes. > > However, with the .map_pages and .unmap_pages interface we're now > getting page-size and count arguments. If we simply combine these > as (page-size * count) and make use of the previous map/unmap > functions internally, any size and alignment assumptions are very > different. > > As an example, a given vfio device assignment VM will often create > a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that > does not support IOMMU super pages, the unmap_pages interface will > ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level() > will recurse down to level 2 of the page table where the first half > of the pfn range exactly matches the entire pte level. We clear the > pte, increment the pfn by the level size, but (oops) the next pte is > on a new page, so we exit the loop an pop back up a level. When we > then update the pfn based on that higher level, we seem to assume > that the previous pfn value was at the start of the level. In this > case the level size is 256K pfns, which we add to the base pfn and > get a results of 0x7fe00, which is clearly greater than 0x401ff, > so we're done. Meanwhile we never cleared the ptes for the remainder > of the range. When the VM remaps this range, we're overwriting valid > ptes and the VT-d driver complains loudly, as reported by the user > report linked below. > > The fix for this seems relatively simple, if each iteration of the > loop in dma_pte_clear_level() is assumed to clear to the end of the > level pte page, then our next pfn should be calculated from level_pfn > rather than our working pfn. > > Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops > callback") > Reported-by: Ajay Garg <ajaygargn...@gmail.com> > Link: > https://lore.kernel.org/all/20211002124012.18186-1-ajaygargn...@gmail.com/ > Signed-off-by: Alex Williamson <alex.william...@redhat.com> > --- > drivers/iommu/intel/iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index d75f59ae28e6..f6395f5425f0 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct > dmar_domain *domain, int level, > freelist); > } > next: > - pfn += level_size(level); > + pfn = level_pfn + level_size(level); > } while (!first_pte_in_page(++pte) && pfn <= last_pfn); > > if (first_pte) > > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu