On Fri, Sep 26, 2025 at 07:47:31AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Thursday, September 4, 2025 1:47 AM
> >
> > map is slightly complicated because it has to handle a number of special
> > edge cases:
> > - Overmapping a previously shared table with an OA - requries validating
> > and freeing the possibly empty tables
> > - Doing the above across an entire to-be-created contiguous entry
> > - Installing a new shared table level concurrently with another thread
> > - Expanding the table by adding more top levels
>
> what is 'shared table'? Looks this term doesn't appear in previous patches.
"shared table level". It is the actual 4k page. Shared means
more than one iommu_map() calls are using indexes in it to make their
mappings work.
like if you make 4k twice then the PGD/PMD/etc table would be "shared"
> also it's unclear to me why overmapping a previously shared table can
> succeed while overmapping leaf entries cannot (w/ -EADDRINUSE)
It has to be empty, let me clarify
- Overmapping a previously shared, but now empty, table level with an OA.
Requries validating and freeing the possibly empty tables
> > +
> > + /* Calculate target page size and level for the leaves */
> > + if (pt_has_system_page(common) && pgsize == PAGE_SIZE &&
> > pgcount == 1) {
> > + PT_WARN_ON(!(pgsize_bitmap & PAGE_SIZE));
> > + if (log2_mod(iova | paddr, PAGE_SHIFT))
> > + return -ENXIO;
> > + map.leaf_pgsize_lg2 = PAGE_SHIFT;
> > + map.leaf_level = 0;
> > + single_page = true;
> > + } else {
> > + map.leaf_pgsize_lg2 = pt_compute_best_pgsize(
> > + pgsize_bitmap, range.va, range.last_va, paddr);
> > + if (!map.leaf_pgsize_lg2)
> > + return -ENXIO;
> > + map.leaf_level =
> > + pt_pgsz_lg2_to_level(common, map.leaf_pgsize_lg2);
>
> Existing driver checks alignment on pgsize, e.g. intel-iommu:
>
> if (!IS_ALIGNED(iova | paddr, pgsize))
> return -EINVAL;
Yes
> But pt_compute_best_pgsize() doesn't use 'pgsize' and only have checks
> on calculated pgsz_lg2:
pgsz_lg2 is the same as 'pgsize' in the intel driver..
pt_compute_best_pgsize() takes in a bitmap of all supported page sizes
at all levels and returns a single page size that should be used for
this mapping.
The single page size satisfies the same alignemnt checks vtd had:
> PT_WARN_ON(log2_mod(va, pgsz_lg2) != 0);
> PT_WARN_ON(oalog2_mod(oa, pgsz_lg2) != 0);
The above are equivalent to IS_ALIGNED(iova | paddr, pgsize).
If no page sizes match the alignment of va and oa then it returns 0
and we fail:
+ if (!map.leaf_pgsize_lg2)
+ return -ENXIO;
If it doesn't fail then it returns the single pgsize that should be
used for this mapping and then we seek to that table level:
+ map.leaf_level =
+ pt_pgsz_lg2_to_level(common, map.leaf_pgsize_lg2);
Then there is another safety check during install leaf through
pt_check_install_leaf_args()
if (PT_WARN_ON(oalog2_mod(oa, oasz_lg2)))
return false;
By the time we get here oasz_lg2 is also pgsize.
> Looks not identical.
It rejects unaligned the same way though.
Further, this is all dead code right now, even the vtd code. Things
were switched over to map_pages() and so the core code has this:
if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
return -EINVAL;
then iommu_pgsize() is guarenteed to work similarly to
pt_compute_best_pgsize().
Meaning the drivers can't see unaligned inputs anyhow.
Jason