On 10/8/21 12:54, Jason Gunthorpe wrote:
> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote:
>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn, 
>> unsigned long addr,
>>                      ret = 0;
>>                      break;
>>              }
>> -            SetPageReferenced(page);
>> -            pages[*nr] = page;
>> -            if (unlikely(!try_grab_page(page, flags))) {
>> -                    undo_dev_pagemap(nr, nr_start, flags, pages);
>> +
>> +            head = compound_head(page);
>> +            /* @end is assumed to be limited at most one compound page */
>> +            if (PageHead(head))
>> +                    next = end;
>> +            refs = record_subpages(page, addr, next, pages + *nr);
>> +
>> +            SetPageReferenced(head);
>> +            if (unlikely(!try_grab_compound_head(head, refs, flags))) {
> 
> I was thinking about this some more, and this ordering doesn't seem
> like a good idea. We shouldn't be looking at any part of the struct
> page without holding the refcount, certainly not the compound_head()
> 
> The only optimization that might work here is to grab the head, then
> compute the extent of tail pages and amalgamate them. Holding a ref on
> the head also secures the tails.
> 

How about pmd_page(orig) / pud_page(orig) like what the rest of hugetlb/thp
checks do? i.e. we would pass pmd_page(orig)/pud_page(orig) to 
__gup_device_huge()
as an added @head argument. While keeping the same structure of counting tail 
pages
between @addr .. @end if we have a head page.

Albeit this lingers on whether it's OK to call PageHead() .. The PageHead 
policy is for
any page (PF_ANY) so no hidden calls to compound_head() when testing that page 
flag. but
in the end it accesses struct page flags which is well, still struct page data.

We could also check pgmap for a non-zero geometry (which tells us that 
pmd_page(orig) does
represent a head page). And that would save us from looking at struct page data 
today, but
it would introduce problems later whenever we remove the pgmap ref grab in 
gup_device_huge().

So the only viable might be to do the grab, count tails and fixup-ref like you 
suggest
above, and take the perf hit of one extra atomic op :(

It's interesting how THP (in gup_huge_pmd()) unilaterally computes tails 
assuming
pmd_page(orig) is the head page.

Reply via email to