On Thu, May 14, 2026 at 09:49:33AM -0400, Gregory Price wrote:
> On Tue, May 12, 2026 at 05:05:54PM -0400, Michael S. Tsirkin wrote:
> > When post_alloc_hook() needs to zero a page for an explicit
> > __GFP_ZERO allocation for a user page (user_addr is set), use
> > folio_zero_user()
> > instead of kernel_init_pages(). This zeros near the faulting
> > address last, keeping those cachelines hot for the impending
> > user access.
> >
> > folio_zero_user() is only used for explicit __GFP_ZERO, not for
> > init_on_alloc. On architectures with virtually-indexed caches
> > (e.g., ARM), clear_user_highpage() performs per-line cache
> > operations; using it for init_on_alloc would add overhead that
> > kernel_init_pages() avoids (the page fault path flushes the
> > cache at PTE installation time regardless).
> >
> > No functional change yet: current callers do not pass __GFP_ZERO
> > for user pages (they zero at the callsite instead). Subsequent
> > patches will convert them.
> >
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > Assisted-by: Claude:claude-opus-4-6
> > ---
> > mm/page_alloc.c | 17 ++++++++++++++---
> > 1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index db387dd6b813..76f39dd026ff 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1861,9 +1861,20 @@ inline void post_alloc_hook(struct page *page,
> > unsigned int order,
> > for (i = 0; i != 1 << order; ++i)
> > page_kasan_tag_reset(page + i);
> > }
> > - /* If memory is still not initialized, initialize it now. */
> > - if (init)
> > - kernel_init_pages(page, 1 << order);
> > + /*
> > + * If memory is still not initialized, initialize it now.
> > + * When __GFP_ZERO was explicitly requested and user_addr is set,
> > + * use folio_zero_user() which zeros near the faulting address
> > + * last, keeping those cachelines hot. For init_on_alloc, use
> > + * kernel_init_pages() to avoid unnecessary cache flush overhead
> > + * on architectures with virtually-indexed caches.
> > + */
> > + if (init) {
> > + if ((gfp_flags & __GFP_ZERO) && user_addr != USER_ADDR_NONE)
> > + folio_zero_user(page_folio(page), user_addr);
> > + else
> > + kernel_init_pages(page, 1 << order);
> > + }
>
> Open question but not necessarily in-scope:
>
> Should __GFP_ZERO just be implied if (user_addr != USER_ADDR_NONE)?
There are calls with no __GFP_ZERO but they do not allocate userspace pages.
- drm_pagemap.c: GFP_HIGHUSER -- no zero. But this is a DRM device
page migration, the page content is preserved from the source.
- test_hmm.c: GFP_HIGHUSER_MOVABLE -- no zero. Test driver, pages get
content from device.
- mm/ksm.c: GFP_HIGHUSER_MOVABLE -- no zero. KSM merges identical
pages, content comes from the source page (copy).
- mm/memory.c new_folio = GFP_HIGHUSER_MOVABLE
- no zero. This is CoW, content is copied from old page.
- mm/userfaultfd.c: GFP_HIGHUSER_MOVABLE - no zero. Content comes from
userspace via userfaultfd.
- arm64/fault.c: __GFP_ZEROTAGS not __GFP_ZERO. MTE tag zeroing, not page
zeroing. Page is zeroed separately.
> Putting aside how that's done without introducing another gfp flag
> (maybe something explicit like `alloc_pages_nozero(...)` ), it seems
> like a very short jump to just adding __GFP_ZERO to any user-alloc by
> default.
>
> I'd be curious to know how many callers across the system omit
> __GFP_ZERO when allocating a user-page, and whether there might be
> scenarios where we subtly miss it (seems unlikely and narrow, but very
> possibly something a driver could do unintentionally).
>
> ~Gregory
I'd do this on top if possible.
--
MST