On 1/8/26 08:03, Zi Yan wrote: > On 7 Jan 2026, at 16:15, Matthew Brost wrote: > >> On Wed, Jan 07, 2026 at 03:38:35PM -0500, Zi Yan wrote: >>> On 7 Jan 2026, at 15:20, Zi Yan wrote: >>> >>>> +THP folks >>> >>> +willy, since he commented in another thread. >>> >>>> >>>> On 16 Dec 2025, at 15:10, Francois Dugast wrote: >>>> >>>>> From: Matthew Brost <[email protected]> >>>>> >>>>> Introduce migrate_device_split_page() to split a device page into >>>>> lower-order pages. Used when a folio allocated as higher-order is freed >>>>> and later reallocated at a smaller order by the driver memory manager. >>>>> >>>>> Cc: Andrew Morton <[email protected]> >>>>> Cc: Balbir Singh <[email protected]> >>>>> Cc: [email protected] >>>>> Cc: [email protected] >>>>> Signed-off-by: Matthew Brost <[email protected]> >>>>> Signed-off-by: Francois Dugast <[email protected]> >>>>> --- >>>>> include/linux/huge_mm.h | 3 +++ >>>>> include/linux/migrate.h | 1 + >>>>> mm/huge_memory.c | 6 ++--- >>>>> mm/migrate_device.c | 49 +++++++++++++++++++++++++++++++++++++++++ >>>>> 4 files changed, 56 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>>> index a4d9f964dfde..6ad8f359bc0d 100644 >>>>> --- a/include/linux/huge_mm.h >>>>> +++ b/include/linux/huge_mm.h >>>>> @@ -374,6 +374,9 @@ int __split_huge_page_to_list_to_order(struct page >>>>> *page, struct list_head *list >>>>> int folio_split_unmapped(struct folio *folio, unsigned int new_order); >>>>> unsigned int min_order_for_split(struct folio *folio); >>>>> int split_folio_to_list(struct folio *folio, struct list_head *list); >>>>> +int __split_unmapped_folio(struct folio *folio, int new_order, >>>>> + struct page *split_at, struct xa_state *xas, >>>>> + struct address_space *mapping, enum split_type >>>>> split_type); >>>>> int folio_check_splittable(struct folio *folio, unsigned int new_order, >>>>> enum split_type split_type); >>>>> int folio_split(struct folio *folio, unsigned int new_order, struct page >>>>> *page, >>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h >>>>> index 26ca00c325d9..ec65e4fd5f88 100644 >>>>> --- a/include/linux/migrate.h >>>>> +++ b/include/linux/migrate.h >>>>> @@ -192,6 +192,7 @@ void migrate_device_pages(unsigned long *src_pfns, >>>>> unsigned long *dst_pfns, >>>>> unsigned long npages); >>>>> void migrate_device_finalize(unsigned long *src_pfns, >>>>> unsigned long *dst_pfns, unsigned long npages); >>>>> +int migrate_device_split_page(struct page *page); >>>>> >>>>> #endif /* CONFIG_MIGRATION */ >>>>> >>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>> index 40cf59301c21..7ded35a3ecec 100644 >>>>> --- a/mm/huge_memory.c >>>>> +++ b/mm/huge_memory.c >>>>> @@ -3621,9 +3621,9 @@ static void __split_folio_to_order(struct folio >>>>> *folio, int old_order, >>>>> * Return: 0 - successful, <0 - failed (if -ENOMEM is returned, @folio >>>>> might be >>>>> * split but not to @new_order, the caller needs to check) >>>>> */ >>>>> -static int __split_unmapped_folio(struct folio *folio, int new_order, >>>>> - struct page *split_at, struct xa_state *xas, >>>>> - struct address_space *mapping, enum split_type split_type) >>>>> +int __split_unmapped_folio(struct folio *folio, int new_order, >>>>> + struct page *split_at, struct xa_state *xas, >>>>> + struct address_space *mapping, enum split_type >>>>> split_type) >>>>> { >>>>> const bool is_anon = folio_test_anon(folio); >>>>> int old_order = folio_order(folio); >>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >>>>> index 23379663b1e1..eb0f0e938947 100644 >>>>> --- a/mm/migrate_device.c >>>>> +++ b/mm/migrate_device.c >>>>> @@ -775,6 +775,49 @@ int migrate_vma_setup(struct migrate_vma *args) >>>>> EXPORT_SYMBOL(migrate_vma_setup); >>>>> >>>>> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION >>>>> +/** >>>>> + * migrate_device_split_page() - Split device page >>>>> + * @page: Device page to split >>>>> + * >>>>> + * Splits a device page into smaller pages. Typically called when >>>>> reallocating a >>>>> + * folio to a smaller size. Inherently racy—only safe if the caller >>>>> ensures >>>>> + * mutual exclusion within the page's folio (i.e., no other threads are >>>>> using >>>>> + * pages within the folio). Expected to be called a free device page and >>>>> + * restores all split out pages to a free state. >>>>> + */ >>> >>> Do you mind explaining why __split_unmapped_folio() is needed for a free >>> device >>> page? A free page is not supposed to be a large folio, at least from a core >>> MM point of view. __split_unmapped_folio() is intended to work on large >>> folios >>> (or compound pages), even if the input folio has refcount == 0 (because it >>> is >>> frozen). >>> >> >> Well, then maybe this is a bug in core MM where the freed page is still >> a THP. Let me explain the scenario and why this is needed from my POV. >> >> Our VRAM allocator in Xe (and several other DRM drivers) is DRM buddy. >> This is a shared pool between traditional DRM GEMs (buffer objects) and >> SVM allocations (pages). It doesn’t have any view of the page backing—it >> basically just hands back a pointer to VRAM space that we allocate from. >> From that, if it’s an SVM allocation, we can derive the device pages. >> >> What I see happening is: a 2M buddy allocation occurs, we make the >> backing device pages a large folio, and sometime later the folio >> refcount goes to zero and we free the buddy allocation. Later, the buddy >> allocation is reused for a smaller allocation (e.g., 4K or 64K), but the >> backing pages are still a large folio. Here is where we need to split > > I agree with you that it might be a bug in free_zone_device_folio() based > on my understanding. Since zone_device_page_init() calls prep_compound_page() > for >0 orders, but free_zone_device_folio() never reverse the process. > > Balbir and Alistair might be able to help here.
I agree it's an API limitation > > I cherry picked the code from __free_frozen_pages() to reverse the process. > Can you give it a try to see if it solve the above issue? Thanks. > > From 3aa03baa39b7e62ea079e826de6ed5aab3061e46 Mon Sep 17 00:00:00 2001 > From: Zi Yan <[email protected]> > Date: Wed, 7 Jan 2026 16:49:52 -0500 > Subject: [PATCH] mm/memremap: free device private folio fix > Content-Type: text/plain; charset="utf-8" > > Signed-off-by: Zi Yan <[email protected]> > --- > mm/memremap.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/mm/memremap.c b/mm/memremap.c > index 63c6ab4fdf08..483666ff7271 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -475,6 +475,21 @@ void free_zone_device_folio(struct folio *folio) > pgmap->ops->folio_free(folio); > break; > } > + > + if (nr > 1) { > + struct page *head = folio_page(folio, 0); > + > + head[1].flags.f &= ~PAGE_FLAGS_SECOND; > +#ifdef NR_PAGES_IN_LARGE_FOLIO > + folio->_nr_pages = 0; > +#endif > + for (i = 1; i < nr; i++) { > + (head + i)->mapping = NULL; > + clear_compound_head(head + i); I see that your skipping the checks in free_page_tail_prepare()? IIUC, we should be able to invoke it even for zone device private pages > + } > + folio->mapping = NULL; This is already done in free_zone_device_folio() > + head->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; I don't think this is required for zone device private folios, but I suppose it keeps the code generic > + } > } > > void zone_device_page_init(struct page *page, unsigned int order) Otherwise, it seems like the right way to solve the issue. Balbir
