On Mon, Oct 11, 2021 at 04:53:29PM +0100, Joao Martins wrote:
> On 10/8/21 12:54, Jason Gunthorpe wrote:
> > The only optimization that might work here is to grab the head, then
> > compute the extent of tail pages and amalgamate them. Holding a ref on
> > the head also secures the tails.
>
> How about pmd_page(orig) / pud_page(orig) like what the rest of hugetlb/thp
> checks do? i.e. we would pass pmd_page(orig)/pud_page(orig) to
> __gup_device_huge()
> as an added @head argument. While keeping the same structure of counting tail
> pages
> between @addr .. @end if we have a head page.
The right logic is what everything else does:
page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
head = try_grab_compound_head(pud_page(orig), refs, flags);
If you can use this, or not, depends entirely on answering the
question of why does __gup_device_huge() exist at all.
This I don't fully know:
1) As discussed quite a few times now, the entire get_dev_pagemap
stuff looks usless and should just be deleted. If you care about
optimizing this I would persue doing that as it will give the
biggest single win.
2) It breaks up the PUD/PMD into tail pages and scans them all
Why? Can devmap combine multiple compound_head's into the same PTE?
Does devmap guarentee that the PUD/PMD points to the head page? (I
assume no)
But I'm looking at this some more and I see try_get_compound_head() is
reading the compound_head with no locking, just READ_ONCE, so that
must be OK under GUP.
It still seems to me the same generic algorithm should work
everywhere, once we get rid of the get_dev_pagemap
start_pfn = pud/pmd_pfn() + pud/pmd_page_offset(addr)
end_pfn = start_pfn + (end - addr) // fixme
if (THP)
refs = end_pfn - start_pfn
if (devmap)
refs = 1
do {
page = pfn_to_page(start_pfn)
head_page = try_grab_compound_head(page, refs, flags)
if (pud/pmd_val() != orig)
err
npages = 1 << compound_order(head_page)
npages = min(npages, end_pfn - start_pfn)
for (i = 0, iter = page; i != npages) {
*pages++ = iter;
mem_map_next(iter, page, i)
}
if (devmap && npages > 2)
grab_compound_head(head_page, npages - 1, flags)
start_pfn += npages;
} while (start_pfn != end_pfn)
Above needs to be cleaned up quite a bit, but is the general idea.
There is an further optimization we can put in where we can know that
'page' is still in a currently grab'd compound (eg last_page+1 = page,
not past compound_order) and defer the refcount work.
> It's interesting how THP (in gup_huge_pmd()) unilaterally computes
> tails assuming pmd_page(orig) is the head page.
I think this is an integral property of THP, probably not devmap/dax
though?
Jason