On 1/15/26 18:07, Alistair Popple wrote: > On 2026-01-15 at 17:18 +1100, Matthew Brost <[email protected]> wrote... >> On Wed, Jan 14, 2026 at 09:57:31PM -0800, Matthew Brost wrote: >>> On Thu, Jan 15, 2026 at 04:27:26PM +1100, Alistair Popple wrote: >>>> On 2026-01-15 at 06:19 +1100, Francois Dugast <[email protected]> >>>> wrote... >>>>> From: Matthew Brost <[email protected]> >>>>> >>>>> Reinitialize metadata for large zone device private folios in >>>>> zone_device_page_init prior to creating a higher-order zone device >>>>> private folio. This step is necessary when the folio’s order changes >>>>> dynamically between zone_device_page_init calls to avoid building a >>>>> corrupt folio. As part of the metadata reinitialization, the dev_pagemap >>>>> must be passed in from the caller because the pgmap stored in the folio >>>>> page may have been overwritten with a compound head. >>>> >>>> Thanks for fixing, a couple of minor comments below. >>>> >>>>> Cc: Zi Yan <[email protected]> >>>>> Cc: Alistair Popple <[email protected]> >>>>> Cc: adhavan Srinivasan <[email protected]> >>>>> Cc: Nicholas Piggin <[email protected]> >>>>> Cc: Michael Ellerman <[email protected]> >>>>> Cc: "Christophe Leroy (CS GROUP)" <[email protected]> >>>>> Cc: Felix Kuehling <[email protected]> >>>>> Cc: Alex Deucher <[email protected]> >>>>> Cc: "Christian König" <[email protected]> >>>>> Cc: David Airlie <[email protected]> >>>>> Cc: Simona Vetter <[email protected]> >>>>> Cc: Maarten Lankhorst <[email protected]> >>>>> Cc: Maxime Ripard <[email protected]> >>>>> Cc: Thomas Zimmermann <[email protected]> >>>>> Cc: Lyude Paul <[email protected]> >>>>> Cc: Danilo Krummrich <[email protected]> >>>>> Cc: David Hildenbrand <[email protected]> >>>>> Cc: Oscar Salvador <[email protected]> >>>>> Cc: Andrew Morton <[email protected]> >>>>> Cc: Jason Gunthorpe <[email protected]> >>>>> Cc: Leon Romanovsky <[email protected]> >>>>> Cc: Lorenzo Stoakes <[email protected]> >>>>> Cc: Liam R. Howlett <[email protected]> >>>>> Cc: Vlastimil Babka <[email protected]> >>>>> Cc: Mike Rapoport <[email protected]> >>>>> Cc: Suren Baghdasaryan <[email protected]> >>>>> Cc: Michal Hocko <[email protected]> >>>>> Cc: Balbir Singh <[email protected]> >>>>> Cc: [email protected] >>>>> Cc: [email protected] >>>>> Cc: [email protected] >>>>> Cc: [email protected] >>>>> Cc: [email protected] >>>>> Cc: [email protected] >>>>> Cc: [email protected] >>>>> Cc: [email protected] >>>>> Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private >>>>> folios") >>>>> Signed-off-by: Matthew Brost <[email protected]> >>>>> Signed-off-by: Francois Dugast <[email protected]> >>>>> --- >>>>> arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- >>>>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- >>>>> drivers/gpu/drm/drm_pagemap.c | 2 +- >>>>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- >>>>> include/linux/memremap.h | 9 ++++++--- >>>>> lib/test_hmm.c | 4 +++- >>>>> mm/memremap.c | 20 +++++++++++++++++++- >>>>> 7 files changed, 32 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c >>>>> b/arch/powerpc/kvm/book3s_hv_uvmem.c >>>>> index e5000bef90f2..7cf9310de0ec 100644 >>>>> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >>>>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >>>>> @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned >>>>> long gpa, struct kvm *kvm) >>>>> >>>>> dpage = pfn_to_page(uvmem_pfn); >>>>> dpage->zone_device_data = pvt; >>>>> - zone_device_page_init(dpage, 0); >>>>> + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); >>>>> return dpage; >>>>> out_clear: >>>>> spin_lock(&kvmppc_uvmem_bitmap_lock); >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>>>> index af53e796ea1b..6ada7b4af7c6 100644 >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c >>>>> @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, >>>>> unsigned long pfn) >>>>> page = pfn_to_page(pfn); >>>>> svm_range_bo_ref(prange->svm_bo); >>>>> page->zone_device_data = prange->svm_bo; >>>>> - zone_device_page_init(page, 0); >>>>> + zone_device_page_init(page, page_pgmap(page), 0); >>>>> } >>>>> >>>>> static void >>>>> diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c >>>>> index 03ee39a761a4..c497726b0147 100644 >>>>> --- a/drivers/gpu/drm/drm_pagemap.c >>>>> +++ b/drivers/gpu/drm/drm_pagemap.c >>>>> @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page >>>>> *page, >>>>> struct drm_pagemap_zdd *zdd) >>>>> { >>>>> page->zone_device_data = drm_pagemap_zdd_get(zdd); >>>>> - zone_device_page_init(page, 0); >>>>> + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); >>>>> } >>>>> >>>>> /** >>>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c >>>>> b/drivers/gpu/drm/nouveau/nouveau_dmem.c >>>>> index 58071652679d..3d8031296eed 100644 >>>>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c >>>>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c >>>>> @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm >>>>> *drm, bool is_large) >>>>> order = ilog2(DMEM_CHUNK_NPAGES); >>>>> } >>>>> >>>>> - zone_device_folio_init(folio, order); >>>>> + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); >>>>> return page; >>>>> } >>>>> >>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>>>> index 713ec0435b48..e3c2ccf872a8 100644 >>>>> --- a/include/linux/memremap.h >>>>> +++ b/include/linux/memremap.h >>>>> @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page >>>>> *page) >>>>> } >>>>> >>>>> #ifdef CONFIG_ZONE_DEVICE >>>>> -void zone_device_page_init(struct page *page, unsigned int order); >>>>> +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >>>>> + unsigned int order); >>>>> void *memremap_pages(struct dev_pagemap *pgmap, int nid); >>>>> void memunmap_pages(struct dev_pagemap *pgmap); >>>>> void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); >>>>> @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, >>>>> unsigned long pfn); >>>>> >>>>> unsigned long memremap_compat_align(void); >>>>> >>>>> -static inline void zone_device_folio_init(struct folio *folio, unsigned >>>>> int order) >>>>> +static inline void zone_device_folio_init(struct folio *folio, >>>>> + struct dev_pagemap *pgmap, >>>>> + unsigned int order) >>>>> { >>>>> - zone_device_page_init(&folio->page, order); >>>>> + zone_device_page_init(&folio->page, pgmap, order); >>>>> if (order) >>>>> folio_set_large_rmappable(folio); >>>>> } >>>>> diff --git a/lib/test_hmm.c b/lib/test_hmm.c >>>>> index 8af169d3873a..455a6862ae50 100644 >>>>> --- a/lib/test_hmm.c >>>>> +++ b/lib/test_hmm.c >>>>> @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct >>>>> dmirror *dmirror, >>>>> goto error; >>>>> } >>>>> >>>>> - zone_device_folio_init(page_folio(dpage), order); >>>>> + zone_device_folio_init(page_folio(dpage), >>>>> + page_pgmap(folio_page(page_folio(dpage), 0)), >>>>> + order); >>>>> dpage->zone_device_data = rpage; >>>>> return dpage; >>>>> >>>>> diff --git a/mm/memremap.c b/mm/memremap.c >>>>> index 63c6ab4fdf08..6f46ab14662b 100644 >>>>> --- a/mm/memremap.c >>>>> +++ b/mm/memremap.c >>>>> @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) >>>>> } >>>>> } >>>>> >>>>> -void zone_device_page_init(struct page *page, unsigned int order) >>>>> +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, >>>>> + unsigned int order) >>>>> { >>>>> + struct page *new_page = page; >>>>> + unsigned int i; >>>>> + >>>>> VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); >>>>> >>>>> + for (i = 0; i < (1UL << order); ++i, ++new_page) { >>>>> + struct folio *new_folio = (struct folio *)new_page; >>>>> + >>>>> + new_page->flags.f &= ~0xffUL; /* Clear possible order, page >>>>> head */ >>>> >>>> This seems odd to me, mainly due to the "magic" number. Why not just clear >>>> the flags entirely? Or at least explicitly just clear the flags you care >>>> about >>>> which would remove the need for the comment and should let you use the >>>> proper >>>> PageFlag functions. >>>> >>> >>> I'm copying this from folio_reset_order [1]. My paranoia about touching >>> anything related to struct page is high, so I did the same thing >>> folio_reset_order does here. > > So why not just use folio_reset_order() below? > >>> >>> [1] https://elixir.bootlin.com/linux/v6.18.5/source/include/linux/mm.h#L1075 >>> >> >> This immediately hangs my first SVM test... >> >> diff --git a/mm/memremap.c b/mm/memremap.c >> index 6f46ab14662b..ef8c56876cf5 100644 >> --- a/mm/memremap.c >> +++ b/mm/memremap.c >> @@ -488,7 +488,7 @@ void zone_device_page_init(struct page *page, struct >> dev_pagemap *pgmap, >> for (i = 0; i < (1UL << order); ++i, ++new_page) { >> struct folio *new_folio = (struct folio *)new_page; >> >> - new_page->flags.f &= ~0xffUL; /* Clear possible order, >> page head */ >> + new_page->flags.f = 0; >> #ifdef NR_PAGES_IN_LARGE_FOLIO >> ((struct folio *)(new_page - 1))->_nr_pages = 0; > > This seems wrong to me - I saw your reply to Balbir but for an order-0 page > isn't this going to access a completely different, possibly already allocated, > page? > >> #endif >> >> I can walk through exactly which flags need to be cleared, but my >> feeling is that likely any flag that the order field overloads and can >> possibly encode should be cleared—so bits 0–7 based on the existing >> code. >> >> How about in a follow-up we normalize setting / clearing the order flag >> field with a #define and an inline helper? > > Ie: Would something like the following work: > > ClearPageHead(new_page); > clear_compound_head(new_page); > folio_reset_order(new_folio); > > Which would also deal with setting _nr_pages. >
I thought about this, but folio_reset_order works only for larger folios otherwise there is a VM_WARN_ON. >> Matt >> >>>>> +#ifdef NR_PAGES_IN_LARGE_FOLIO >>>>> + ((struct folio *)(new_page - 1))->_nr_pages = 0; >>>>> +#endif >>>>> + new_folio->mapping = NULL; >>>>> + new_folio->pgmap = pgmap; /* Also clear compound head */ >>>>> + new_folio->share = 0; /* fsdax only, unused for device >>>>> private */ >>>> >>>> It would be nice if the FS DAX code actually used this as well. Is there a >>>> reason that change was dropped from the series? >>>> >>> >>> I don't have a test platform for FS DAX. In prior revisions, I was just >>> moving existing FS DAX code to a helper, which I felt confident about. >>> >>> This revision is slightly different, and I don't feel comfortable >>> modifying FS DAX code without a test platform. I agree we should update >>> FS DAX, but that should be done in a follow-up with coordinated testing. > > Fair enough, I figured something like this might be your answer :-) You > could update it and ask people with access to such a system to test it though > (unfortunately my setup has bit-rotted beyond repair). > > But I'm ok leaving to for a future change. > >>> >>> Matt >>> >>>>> + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); >>>>> + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); >>>>> + } >>>>> + >>>>> /* >>>>> * Drivers shouldn't be allocating pages after calling >>>>> * memunmap_pages(). >>>>> -- >>>>> 2.43.0 >>>>>
