On Tue, Aug 31, 2021 at 01:34:04PM +0100, Joao Martins wrote:
> On 8/30/21 2:07 PM, Jason Gunthorpe wrote:
> > On Fri, Aug 27, 2021 at 07:34:54PM +0100, Joao Martins wrote:
> >> On 8/27/21 5:25 PM, Jason Gunthorpe wrote:
> >>> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote:
> >>>
> >>>> #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) &&
> >>>> defined(CONFIG_TRANSPARENT_HUGEPAGE)
> >>>> static int __gup_device_huge(unsigned long pfn, unsigned long addr,
> >>>> unsigned long end, unsigned int flags,
> >>>> struct page **pages, int *nr)
> >>>> {
> >>>> - int nr_start = *nr;
> >>>> + int refs, nr_start = *nr;
> >>>> struct dev_pagemap *pgmap = NULL;
> >>>> int ret = 1;
> >>>>
> >>>> do {
> >>>> - struct page *page = pfn_to_page(pfn);
> >>>> + struct page *head, *page = pfn_to_page(pfn);
> >>>> + unsigned long next = addr + PAGE_SIZE;
> >>>>
> >>>> pgmap = get_dev_pagemap(pfn, pgmap);
> >>>> if (unlikely(!pgmap)) {
> >>>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn,
> >>>> unsigned long addr,
> >>>> ret = 0;
> >>>> break;
> >>>> }
> >>>> - SetPageReferenced(page);
> >>>> - pages[*nr] = page;
> >>>> - if (unlikely(!try_grab_page(page, flags))) {
> >>>> - undo_dev_pagemap(nr, nr_start, flags, pages);
> >>>> +
> >>>> + head = compound_head(page);
> >>>> + /* @end is assumed to be limited at most one compound
> >>>> page */
> >>>> + if (PageHead(head))
> >>>> + next = end;
> >>>> + refs = record_subpages(page, addr, next, pages + *nr);
> >>>> +
> >>>> + SetPageReferenced(head);
> >>>> + if (unlikely(!try_grab_compound_head(head, refs,
> >>>> flags))) {
> >>>> + if (PageHead(head))
> >>>> + ClearPageReferenced(head);
> >>>> + else
> >>>> + undo_dev_pagemap(nr, nr_start, flags,
> >>>> pages);
> >>>> ret = 0;
> >>>> break;
> >>>
> >>> Why is this special cased for devmap?
> >>>
> >>> Shouldn't everything processing pud/pmds/etc use the same basic loop
> >>> that is similar in idea to the 'for_each_compound_head' scheme in
> >>> unpin_user_pages_dirty_lock()?
> >>>
> >>> Doesn't that work for all the special page type cases here?
> >>
> >> We are iterating over PFNs to create an array of base pages (regardless of
> >> page table
> >> type), rather than iterating over an array of pages to work on.
> >
> > That is part of it, yes, but the slow bit here is to minimally find
> > the head pages and do the atomics on them, much like the
> > unpin_user_pages_dirty_lock()
> >
> > I would think this should be designed similar to how things work on
> > the unpin side.
> >
> I don't think it's the same thing. The bit you say 'minimally find the
> head pages' carries a visible overhead in unpin_user_pages() as we are
> checking each of the pages belongs to the same head page -- because you
> can pass an arbritary set of pages. This does have a cost which is not
> in gup-fast right now AIUI. Whereas in our gup-fast 'handler' you
> already know that you are processing a contiguous chunk of pages.
> If anything, we are closer to unpin_user_page_range*()
> than unpin_user_pages().
Yes, that is what I mean, it is very similar to the range case as we
don't even know that a single compound spans a pud/pmd. So you end up
doing the same loop to find the compound boundaries.
Under GUP slow we can also aggregate multiple page table entires, eg a
split huge page could be procesed as a single 2M range operation even
if it is broken to 4K PTEs.
>
> Switching to similar iteration logic to unpin would look something like
> this (still untested):
>
> for_each_compound_range(index, &page, npages, head, refs) {
> pgmap = get_dev_pagemap(pfn + *nr, pgmap);
I recall talking to DanW about this and we agreed it was unnecessary
here to hold the pgmap and should be deleted.
Jason