On 12/9/20 4:40 AM, John Hubbard wrote:
> On 12/8/20 9:28 AM, Joao Martins wrote:
>> Much like hugetlbfs or THPs, we treat device pagemaps with
>> compound pages like the rest of GUP handling of compound pages.
>>
>> Rather than incrementing the refcount every 4K, we record
>> all sub pages and increment by @refs amount *once*.
>>
>> Performance measured by gup_benchmark improves considerably
>> get_user_pages_fast() and pin_user_pages_fast():
>>
>>   $ gup_benchmark -f /dev/dax0.2 -m 16384 -r 10 -S [-u,-a] -n 512 -w
> 
> "gup_test", now that you're in linux-next, actually.
> 
> (Maybe I'll retrofit that test with getopt_long(), those options are
> getting more elaborate.)
> 
:)

>>
>> (get_user_pages_fast 2M pages) ~75k us -> ~3.6k us
>> (pin_user_pages_fast 2M pages) ~125k us -> ~3.8k us
> 
> That is a beautiful result! I'm very motivated to see if this patchset
> can make it in, in some form.
> 
Cool!

>>
>> Signed-off-by: Joao Martins <[email protected]>
>> ---
>>   mm/gup.c | 67 ++++++++++++++++++++++++++++++++++++++++++--------------
>>   1 file changed, 51 insertions(+), 16 deletions(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 98eb8e6d2609..194e6981eb03 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2250,22 +2250,68 @@ static int gup_pte_range(pmd_t pmd, unsigned long 
>> addr, unsigned long end,
>>   }
>>   #endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */
>>   
>> +
>> +static int record_subpages(struct page *page, unsigned long addr,
>> +                       unsigned long end, struct page **pages)
>> +{
>> +    int nr;
>> +
>> +    for (nr = 0; addr != end; addr += PAGE_SIZE)
>> +            pages[nr++] = page++;
>> +
>> +    return nr;
>> +}
>> +
>>   #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && 
>> defined(CONFIG_TRANSPARENT_HUGEPAGE)
>> -static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>> -                         unsigned long end, unsigned int flags,
>> -                         struct page **pages, int *nr)
>> +static int __gup_device_compound_huge(struct dev_pagemap *pgmap,
>> +                                  struct page *head, unsigned long sz,
> 
> If this variable survives (I see Jason requested a reorg of this math stuff,
> and I also like that idea), then I'd like a slightly better name for "sz".
> 
Yeap.

> I was going to suggest one, but then realized that I can't understand how this
> works. See below...
> 
>> +                                  unsigned long addr, unsigned long end,
>> +                                  unsigned int flags, struct page **pages)
>> +{
>> +    struct page *page;
>> +    int refs;
>> +
>> +    if (!(pgmap->flags & PGMAP_COMPOUND))
>> +            return -1;
> 
> btw, I'm unhappy with returning -1 here and assigning it later to a refs 
> variable.
> (And that will show up even more clearly as an issue if you attempt to make
> refs unsigned everywhere!)
> 
Yes true.

The usage of @refs = -1 (therefore an int) was to differentiate when we are not 
in a
PGMAP_COMPOUND pgmap (and so for logic to keep as today).

Notice that in the PGMAP_COMPOUND case if we fail to grab the head compound 
page we return 0.

> I'm not going to suggest anything because there are a lot of ways to structure
> these routines, and I don't want to overly constrain you. Just please don't 
> assign
> negative values to any refs variables.
> 
OK.

TBH I'm a little afraid this can turn into further complexity if I have to keep 
the
non-compound pgmap around. But I will see how I can adjust this.

>> +
>> +    page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
> 
> If you pass in PMD_SHIFT or PUD_SHIFT for, that's a number-of-bits, isn't it?
> Not a size. And if it's not a size, then sz - 1 doesn't work, does it? If it
> does work, then better naming might help. I'm probably missing a really
> obvious math trick here.

You're right. That was a mistake on my end, indeed. But the mistake wouldn't 
change the
logic, as the PageReference bit only applies to the head page.

        Joao
_______________________________________________
Linux-nvdimm mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to