On Wed, 1 Jul 2026 16:28:08 -0700, [email protected] wrote:

> On Wed,  1 Jul 2026 17:05:45 +0800 "Li Zhe" <[email protected]> wrote:
> 
> > memmap_init_zone_device() can take a noticeable amount of time when large
> > pmem namespaces are bound or rebound, because it initializes nearly
> > identical struct page descriptors one PFN at a time. This series reduces
> > that ZONE_DEVICE memmap initialization overhead by reusing prepared
> > struct page templates and, on x86, using memcpy_nt() for the template
> > copy path.
> >
> > The main target is large fsdax/devdax pmem configurations, where the
> > cost of initializing the memmap shows up directly in nd_pmem/dax_pmem
> > bind and rebind latency.
> >
> > Patches 1-3 are preparatory cleanups and helper extraction. Patches 4-5
> > add the template-copy fast path for head pages and compound tails.
> > Patches 6-8 introduce memcpy_nt()/memcpy_nt_drain(), extend the x86
> > fixed-size memcpy_flushcache() inline cases used by that helper, and
> > switch the template-copy path over to memcpy_nt().
> >
> > The fast path remains disabled when the page_ref_set tracepoint is
> > active, and sanitized builds stay on the slow path so their instrumented
> > stores are preserved. Architectures without a specialized memcpy_nt()
> > backend continue to fall back to memcpy().
> >
> > Tested in a VM with a 100 GB fsdax namespace device configured with
> > map=dev and a 100 GB devdax namespace (align=2097152) on Intel Ice Lake
> > server.
> 
> Thanks for persisting with this.
> 
> Review is still thin :( I see that Mike, Boris and Alistair have
> commented on previous versions.  As did Balbir, who wasn't cc'ed on
> this (fixed).
> 
> I'll add it to mm.git for testing exposure (because I'm still a sucker
> for speedups), but more review is needed, please.

Thanks, Andrew. I appreciate your help here. I'll address the latest
review comments in the next version.

Thanks,
Zhe

Reply via email to