On Mon, 2026-02-02 at 14:28 -0800, John Hubbard wrote: > On 2/2/26 1:13 AM, Thomas Hellström wrote: > > On Sat, 2026-01-31 at 13:42 -0800, John Hubbard wrote: > > > On 1/31/26 11:00 AM, Matthew Brost wrote: > > > > On Sat, Jan 31, 2026 at 01:57:21PM +0100, Thomas Hellström > > > > wrote: > > > > > On Fri, 2026-01-30 at 19:01 -0800, John Hubbard wrote: > > > > > > On 1/30/26 10:00 AM, Andrew Morton wrote: > > > > > > > On Fri, 30 Jan 2026 15:45:29 +0100 Thomas Hellström > > > > > > > <[email protected]> wrote: > > > > > > ... > > > > > > > > > > > > I'm also not sure a folio refcount should block migration > > > > > after > > > > > the > > > > > introduction of pinned (like in pin_user_pages) pages. Rather > > > > > perhaps a > > > > > folio pin-count should block migration and in that case > > > > > do_swap_page() > > > > > can definitely do a sleeping folio lock and the problem is > > > > > gone. > > > > > > A problem for that specific point is that pincount and refcount > > > both > > > mean, "the page is pinned" (which in turn literally means "not > > > allowed > > > to migrate/move"). > > > > Yeah this is what I actually want to challenge since this is what > > blocks us from doing a clean robust solution here. From brief > > reading > > of the docs around the pin-count implementation, I understand it as > > "If > > you want to access the struct page metadata, get a refcount, If you > > want to access the actual memory of a page, take a pin-count" > > > > I guess that might still not be true for all old instances in the > > kernel using get_user_pages() instead of pin_user_pages() for > > things > > like DMA, but perhaps we can set that in stone and document it at > > least > > for device-private pages for now which would be sufficient for the > > do_swap_pages() refcount not to block migration. > > > > It's an interesting direction to go... > > > > > > > > > (In fact, pincount is implemented in terms of refcount, in most > > > configurations still.) > > > > Yes but that's only a space optimization never intended to > > conflict, > > right? Meaning a pin-count will imply a refcount but a refcount > > will > > never imply a pin-count? > > > Unfortunately, they are more tightly linked than that today, at least > until > someday when specialized folios are everywhere (at which point > pincount > gets its own field). > > Until then, it's not just a "space optimization", it's "overload > refcount > to also do pincounting". And "let core mm continue to treat refcounts > as > meaning that the page is pinned".
So this is what I had in mind: I think certainly this would work regardless of whether pincount is implemented by means of refcount with a bias or not, and AFAICT it's also consistent with https://docs.kernel.org/core-api/pin_user_pages.html But it would not work if some part of core mm grabs a page refcount and *expects* that to pin a page in the sense that it should not be migrated. But you're suggesting that's actually the case? Thanks, Thomas diff --git a/mm/migrate_device.c b/mm/migrate_device.c index a101a187e6da..c07a79995128 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -534,33 +534,15 @@ static void migrate_vma_collect(struct migrate_vma *migrate) * migrate_vma_check_page() - check if page is pinned or not * @page: struct page to check * - * Pinned pages cannot be migrated. This is the same test as in - * folio_migrate_mapping(), except that here we allow migration of a - * ZONE_DEVICE page. + * Pinned pages cannot be migrated. */ static bool migrate_vma_check_page(struct page *page, struct page *fault_page) { struct folio *folio = page_folio(page); - /* - * One extra ref because caller holds an extra reference, either from - * folio_isolate_lru() for a regular folio, or migrate_vma_collect() for - * a device folio. - */ - int extra = 1 + (page == fault_page); - - /* Page from ZONE_DEVICE have one extra reference */ - if (folio_is_zone_device(folio)) - extra++; - - /* For file back page */ - if (folio_mapping(folio)) - extra += 1 + folio_has_private(folio); - - if ((folio_ref_count(folio) - extra) > folio_mapcount(folio)) - return false; + VM_WARN_ON_FOLIO(folio_test_lru(folio) || folio_mapped(folio), folio); - return true; + return !folio_maybe_dma_pinned(folio); } > > > thanks,
