On 2026-01-15 at 06:19 +1100, Francois Dugast <[email protected]> wrote... > From: Matthew Brost <[email protected]> > > Reinitialize metadata for large zone device private folios in > zone_device_page_init prior to creating a higher-order zone device > private folio. This step is necessary when the folio’s order changes > dynamically between zone_device_page_init calls to avoid building a > corrupt folio. As part of the metadata reinitialization, the dev_pagemap > must be passed in from the caller because the pgmap stored in the folio > page may have been overwritten with a compound head.
Thanks for fixing, a couple of minor comments below. > Cc: Zi Yan <[email protected]> > Cc: Alistair Popple <[email protected]> > Cc: adhavan Srinivasan <[email protected]> > Cc: Nicholas Piggin <[email protected]> > Cc: Michael Ellerman <[email protected]> > Cc: "Christophe Leroy (CS GROUP)" <[email protected]> > Cc: Felix Kuehling <[email protected]> > Cc: Alex Deucher <[email protected]> > Cc: "Christian König" <[email protected]> > Cc: David Airlie <[email protected]> > Cc: Simona Vetter <[email protected]> > Cc: Maarten Lankhorst <[email protected]> > Cc: Maxime Ripard <[email protected]> > Cc: Thomas Zimmermann <[email protected]> > Cc: Lyude Paul <[email protected]> > Cc: Danilo Krummrich <[email protected]> > Cc: David Hildenbrand <[email protected]> > Cc: Oscar Salvador <[email protected]> > Cc: Andrew Morton <[email protected]> > Cc: Jason Gunthorpe <[email protected]> > Cc: Leon Romanovsky <[email protected]> > Cc: Lorenzo Stoakes <[email protected]> > Cc: Liam R. Howlett <[email protected]> > Cc: Vlastimil Babka <[email protected]> > Cc: Mike Rapoport <[email protected]> > Cc: Suren Baghdasaryan <[email protected]> > Cc: Michal Hocko <[email protected]> > Cc: Balbir Singh <[email protected]> > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Cc: [email protected] > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private > folios") > Signed-off-by: Matthew Brost <[email protected]> > Signed-off-by: Francois Dugast <[email protected]> > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > drivers/gpu/drm/drm_pagemap.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +- > include/linux/memremap.h | 9 ++++++--- > lib/test_hmm.c | 4 +++- > mm/memremap.c | 20 +++++++++++++++++++- > 7 files changed, 32 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c > b/arch/powerpc/kvm/book3s_hv_uvmem.c > index e5000bef90f2..7cf9310de0ec 100644 > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c > @@ -723,7 +723,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long > gpa, struct kvm *kvm) > > dpage = pfn_to_page(uvmem_pfn); > dpage->zone_device_data = pvt; > - zone_device_page_init(dpage, 0); > + zone_device_page_init(dpage, &kvmppc_uvmem_pgmap, 0); > return dpage; > out_clear: > spin_lock(&kvmppc_uvmem_bitmap_lock); > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index af53e796ea1b..6ada7b4af7c6 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -217,7 +217,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, > unsigned long pfn) > page = pfn_to_page(pfn); > svm_range_bo_ref(prange->svm_bo); > page->zone_device_data = prange->svm_bo; > - zone_device_page_init(page, 0); > + zone_device_page_init(page, page_pgmap(page), 0); > } > > static void > diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c > index 03ee39a761a4..c497726b0147 100644 > --- a/drivers/gpu/drm/drm_pagemap.c > +++ b/drivers/gpu/drm/drm_pagemap.c > @@ -201,7 +201,7 @@ static void drm_pagemap_get_devmem_page(struct page *page, > struct drm_pagemap_zdd *zdd) > { > page->zone_device_data = drm_pagemap_zdd_get(zdd); > - zone_device_page_init(page, 0); > + zone_device_page_init(page, zdd->dpagemap->pagemap, 0); > } > > /** > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c > b/drivers/gpu/drm/nouveau/nouveau_dmem.c > index 58071652679d..3d8031296eed 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c > @@ -425,7 +425,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm, > bool is_large) > order = ilog2(DMEM_CHUNK_NPAGES); > } > > - zone_device_folio_init(folio, order); > + zone_device_folio_init(folio, page_pgmap(folio_page(folio, 0)), order); > return page; > } > > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 713ec0435b48..e3c2ccf872a8 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -224,7 +224,8 @@ static inline bool is_fsdax_page(const struct page *page) > } > > #ifdef CONFIG_ZONE_DEVICE > -void zone_device_page_init(struct page *page, unsigned int order); > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > + unsigned int order); > void *memremap_pages(struct dev_pagemap *pgmap, int nid); > void memunmap_pages(struct dev_pagemap *pgmap); > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); > @@ -234,9 +235,11 @@ bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned > long pfn); > > unsigned long memremap_compat_align(void); > > -static inline void zone_device_folio_init(struct folio *folio, unsigned int > order) > +static inline void zone_device_folio_init(struct folio *folio, > + struct dev_pagemap *pgmap, > + unsigned int order) > { > - zone_device_page_init(&folio->page, order); > + zone_device_page_init(&folio->page, pgmap, order); > if (order) > folio_set_large_rmappable(folio); > } > diff --git a/lib/test_hmm.c b/lib/test_hmm.c > index 8af169d3873a..455a6862ae50 100644 > --- a/lib/test_hmm.c > +++ b/lib/test_hmm.c > @@ -662,7 +662,9 @@ static struct page *dmirror_devmem_alloc_page(struct > dmirror *dmirror, > goto error; > } > > - zone_device_folio_init(page_folio(dpage), order); > + zone_device_folio_init(page_folio(dpage), > + page_pgmap(folio_page(page_folio(dpage), 0)), > + order); > dpage->zone_device_data = rpage; > return dpage; > > diff --git a/mm/memremap.c b/mm/memremap.c > index 63c6ab4fdf08..6f46ab14662b 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -477,10 +477,28 @@ void free_zone_device_folio(struct folio *folio) > } > } > > -void zone_device_page_init(struct page *page, unsigned int order) > +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, > + unsigned int order) > { > + struct page *new_page = page; > + unsigned int i; > + > VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); > > + for (i = 0; i < (1UL << order); ++i, ++new_page) { > + struct folio *new_folio = (struct folio *)new_page; > + > + new_page->flags.f &= ~0xffUL; /* Clear possible order, page > head */ This seems odd to me, mainly due to the "magic" number. Why not just clear the flags entirely? Or at least explicitly just clear the flags you care about which would remove the need for the comment and should let you use the proper PageFlag functions. > +#ifdef NR_PAGES_IN_LARGE_FOLIO > + ((struct folio *)(new_page - 1))->_nr_pages = 0; > +#endif > + new_folio->mapping = NULL; > + new_folio->pgmap = pgmap; /* Also clear compound head */ > + new_folio->share = 0; /* fsdax only, unused for device > private */ It would be nice if the FS DAX code actually used this as well. Is there a reason that change was dropped from the series? > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio); > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio); > + } > + > /* > * Drivers shouldn't be allocating pages after calling > * memunmap_pages(). > -- > 2.43.0 >
