On 10/8/21 12:54, Jason Gunthorpe wrote:
> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote:
>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn,
>> unsigned long addr,
>> ret = 0;
>> break;
>> }
>> - SetPageReferenced(page);
>> - pages[*nr] = page;
>> - if (unlikely(!try_grab_page(page, flags))) {
>> - undo_dev_pagemap(nr, nr_start, flags, pages);
>> +
>> + head = compound_head(page);
>> + /* @end is assumed to be limited at most one compound page */
>> + if (PageHead(head))
>> + next = end;
>> + refs = record_subpages(page, addr, next, pages + *nr);
>> +
>> + SetPageReferenced(head);
>> + if (unlikely(!try_grab_compound_head(head, refs, flags))) {
>
> I was thinking about this some more, and this ordering doesn't seem
> like a good idea. We shouldn't be looking at any part of the struct
> page without holding the refcount, certainly not the compound_head()
>
> The only optimization that might work here is to grab the head, then
> compute the extent of tail pages and amalgamate them. Holding a ref on
> the head also secures the tails.
>
How about pmd_page(orig) / pud_page(orig) like what the rest of hugetlb/thp
checks do? i.e. we would pass pmd_page(orig)/pud_page(orig) to
__gup_device_huge()
as an added @head argument. While keeping the same structure of counting tail
pages
between @addr .. @end if we have a head page.
Albeit this lingers on whether it's OK to call PageHead() .. The PageHead
policy is for
any page (PF_ANY) so no hidden calls to compound_head() when testing that page
flag. but
in the end it accesses struct page flags which is well, still struct page data.
We could also check pgmap for a non-zero geometry (which tells us that
pmd_page(orig) does
represent a head page). And that would save us from looking at struct page data
today, but
it would introduce problems later whenever we remove the pgmap ref grab in
gup_device_huge().
So the only viable might be to do the grab, count tails and fixup-ref like you
suggest
above, and take the perf hit of one extra atomic op :(
It's interesting how THP (in gup_huge_pmd()) unilaterally computes tails
assuming
pmd_page(orig) is the head page.