Hi, On Mon, 2026-02-02 at 11:10 +1100, Alistair Popple wrote: > On 2026-02-02 at 08:07 +1100, Matthew Brost <[email protected]> > wrote... > > On Sun, Feb 01, 2026 at 12:48:33PM -0800, John Hubbard wrote: > > > On 2/1/26 11:24 AM, Matthew Brost wrote: > > > > On Sat, Jan 31, 2026 at 01:42:20PM -0800, John Hubbard wrote: > > > > > On 1/31/26 11:00 AM, Matthew Brost wrote: > > > > > > On Sat, Jan 31, 2026 at 01:57:21PM +0100, Thomas Hellström > > > > > > wrote: > > > > > > > On Fri, 2026-01-30 at 19:01 -0800, John Hubbard wrote: > > > > > > > > On 1/30/26 10:00 AM, Andrew Morton wrote: > > > > > > > > > On Fri, 30 Jan 2026 15:45:29 +0100 Thomas Hellström > > > > > > > > > wrote: > > > > > > > > ... > > > > > > I’m not convinced the folio refcount has any bearing if we > > > > > > can take a > > > > > > sleeping lock in do_swap_page, but perhaps I’m missing > > > > > > something. > > I think the point of the trylock vs. lock is that if you can't > immediately > lock the page then it's an indication the page is undergoing a > migration. > In other words there's no point waiting for the lock and then trying > to call > migrate_to_ram() as the page will have already moved by the time you > acquire > the lock. Of course that just means you spin faulting until the page > finally > migrates. > > If I'm understanding the problem it sounds like we just want to sleep > until the > migration is complete, ie. same as the migration entry path. We don't > have a > device_private_entry_wait() function, but I don't think we need one, > see below. > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > > index da360a6eb8a4..1e7ccc4a1a6c 100644 > > > > --- a/mm/memory.c > > > > +++ b/mm/memory.c > > > > @@ -4652,6 +4652,8 @@ vm_fault_t do_swap_page(struct vm_fault > > > > *vmf) > > > > vmf->page = softleaf_to_page(entry); > > > > ret = > > > > remove_device_exclusive_entry(vmf); > > > > } else if (softleaf_is_device_private(entry)) > > > > { > > > > + struct dev_pagemap *pgmap; > > > > + > > > > if (vmf->flags & FAULT_FLAG_VMA_LOCK) > > > > { > > > > /* > > > > * migrate_to_ram is not yet > > > > ready to operate > > > > @@ -4670,21 +4672,15 @@ vm_fault_t do_swap_page(struct vm_fault > > > > *vmf) > > > > vmf- > > > > >orig_pte))) > > > > goto unlock; > > > > > > > > - /* > > > > - * Get a page reference while we know > > > > the page can't be > > > > - * freed. > > > > - */ > > > > - if (trylock_page(vmf->page)) { > > > > - struct dev_pagemap *pgmap; > > > > - > > > > - get_page(vmf->page); > > At this point we: > 1. Know the page needs to migrate > 2. Have the page locked > 3. Have a reference on the page > 4. Have the PTL locked > > Or in other words we have everything we need to install a migration > entry, > so why not just do that? This thread would then proceed into > migrate_to_ram() > having already done migrate_vma_collect_pmd() for the faulting page > and any > other threads would just sleep in the wait on migration entry path > until the > migration is complete, avoiding the livelock problem the trylock was > introduced > for in 1afaeb8293c9a. > > - Alistair > > > >
There will always be a small time between when the page is locked and when we can install a migration entry. If the page only has a single mapcount, then the PTL lock is held during this time so the issue does not occur. But for multiple map-counts we need to release the PTL lock in migration to run try_to_migrate(), and before that, the migrate code is running lru_add_drain_all() and gets stuck. However it looks like Matt Brost, inspired by the above, tried to change the order of try_to_migrate() vs lru_add_drain_all() which would cause the fault handler to eventually block on a migration entry. That seems to solve the issue we're seeing here. Still, I think the spinning on the trylock here is still something we'd want to get rid off, because IMO we shouldn't need to adapt otherwise correct but slightly suboptimal code to a badly behaving path in the fault handler. In essence we're never allowed to lock a device-private folio and then call something that needs to execute on all processors before unlocking. Thanks, Thomas
