starvation problem

Thomas Hellström Tue, 03 Feb 2026 01:31:33 -0800


On Mon, 2026-02-02 at 14:28 -0800, John Hubbard wrote:
> On 2/2/26 1:13 AM, Thomas Hellström wrote:
> > On Sat, 2026-01-31 at 13:42 -0800, John Hubbard wrote:
> > > On 1/31/26 11:00 AM, Matthew Brost wrote:
> > > > On Sat, Jan 31, 2026 at 01:57:21PM +0100, Thomas Hellström
> > > > wrote:
> > > > > On Fri, 2026-01-30 at 19:01 -0800, John Hubbard wrote:
> > > > > > On 1/30/26 10:00 AM, Andrew Morton wrote:
> > > > > > > On Fri, 30 Jan 2026 15:45:29 +0100 Thomas Hellström
> > > > > > > <[email protected]> wrote:
> > > > > > ...
> > > 
> > > > 
> > > > > I'm also not sure a folio refcount should block migration
> > > > > after
> > > > > the
> > > > > introduction of pinned (like in pin_user_pages) pages. Rather
> > > > > perhaps a
> > > > > folio pin-count should block migration and in that case
> > > > > do_swap_page()
> > > > > can definitely do a sleeping folio lock and the problem is
> > > > > gone.
> > > 
> > > A problem for that specific point is that pincount and refcount
> > > both
> > > mean, "the page is pinned" (which in turn literally means "not
> > > allowed
> > > to migrate/move").
> > 
> > Yeah this is what I actually want to challenge since this is what
> > blocks us from doing a clean robust solution here. From brief
> > reading
> > of the docs around the pin-count implementation, I understand it as
> > "If
> > you want to access the struct page metadata, get a refcount, If you
> > want to access the actual memory of a page, take a pin-count"
> > 
> > I guess that might still not be true for all old instances in the
> > kernel using get_user_pages() instead of pin_user_pages() for
> > things
> > like DMA, but perhaps we can set that in stone and document it at
> > least
> > for device-private pages for now which would be sufficient for the
> > do_swap_pages() refcount not to block migration.
> > 
> 
> It's an interesting direction to go...
> 
> > 
> > > 
> > > (In fact, pincount is implemented in terms of refcount, in most
> > > configurations still.)
> > 
> > Yes but that's only a space optimization never intended to
> > conflict,
> > right? Meaning a pin-count will imply a refcount but a refcount
> > will
> > never imply a pin-count?
> > 
> Unfortunately, they are more tightly linked than that today, at least
> until
> someday when specialized folios are everywhere (at which point
> pincount
> gets its own field).
> 
> Until then, it's not just a "space optimization", it's "overload
> refcount
> to also do pincounting". And "let core mm continue to treat refcounts
> as
> meaning that the page is pinned".


So this is what I had in mind:

I think certainly this would work regardless of whether pincount is
implemented by means of refcount with a bias or not, and AFAICT it's
also consistent with 

https://docs.kernel.org/core-api/pin_user_pages.html

But it would not work if some part of core mm grabs a page refcount and
*expects* that to pin a page in the sense that it should not be
migrated. But you're suggesting that's actually the case?

Thanks,
Thomas

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index a101a187e6da..c07a79995128 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -534,33 +534,15 @@ static void migrate_vma_collect(struct
migrate_vma *migrate)
  * migrate_vma_check_page() - check if page is pinned or not
  * @page: struct page to check
  *
- * Pinned pages cannot be migrated. This is the same test as in
- * folio_migrate_mapping(), except that here we allow migration of a
- * ZONE_DEVICE page.
+ * Pinned pages cannot be migrated.
  */
 static bool migrate_vma_check_page(struct page *page, struct page
*fault_page)
 {
        struct folio *folio = page_folio(page);
 
-       /*
-        * One extra ref because caller holds an extra reference,
either from
-        * folio_isolate_lru() for a regular folio, or
migrate_vma_collect() for
-        * a device folio.
-        */
-       int extra = 1 + (page == fault_page);
-
-       /* Page from ZONE_DEVICE have one extra reference */
-       if (folio_is_zone_device(folio))
-               extra++;
-
-       /* For file back page */
-       if (folio_mapping(folio))
-               extra += 1 + folio_has_private(folio);
-
-       if ((folio_ref_count(folio) - extra) > folio_mapcount(folio))
-               return false;
+       VM_WARN_ON_FOLIO(folio_test_lru(folio) || folio_mapped(folio),
folio);
 
-       return true;
+       return !folio_maybe_dma_pinned(folio);
 }
 



> 
> 
> thanks,

Re: [PATCH] mm/hmm: Fix a hmm_range_fault() livelock / starvation problem

Reply via email to