On Wed, 1 Jul 2026 16:28:08 -0700, [email protected] wrote: > On Wed, 1 Jul 2026 17:05:45 +0800 "Li Zhe" <[email protected]> wrote: > > > memmap_init_zone_device() can take a noticeable amount of time when large > > pmem namespaces are bound or rebound, because it initializes nearly > > identical struct page descriptors one PFN at a time. This series reduces > > that ZONE_DEVICE memmap initialization overhead by reusing prepared > > struct page templates and, on x86, using memcpy_nt() for the template > > copy path. > > > > The main target is large fsdax/devdax pmem configurations, where the > > cost of initializing the memmap shows up directly in nd_pmem/dax_pmem > > bind and rebind latency. > > > > Patches 1-3 are preparatory cleanups and helper extraction. Patches 4-5 > > add the template-copy fast path for head pages and compound tails. > > Patches 6-8 introduce memcpy_nt()/memcpy_nt_drain(), extend the x86 > > fixed-size memcpy_flushcache() inline cases used by that helper, and > > switch the template-copy path over to memcpy_nt(). > > > > The fast path remains disabled when the page_ref_set tracepoint is > > active, and sanitized builds stay on the slow path so their instrumented > > stores are preserved. Architectures without a specialized memcpy_nt() > > backend continue to fall back to memcpy(). > > > > Tested in a VM with a 100 GB fsdax namespace device configured with > > map=dev and a 100 GB devdax namespace (align=2097152) on Intel Ice Lake > > server. > > Thanks for persisting with this. > > Review is still thin :( I see that Mike, Boris and Alistair have > commented on previous versions. As did Balbir, who wasn't cc'ed on > this (fixed). > > I'll add it to mm.git for testing exposure (because I'm still a sucker > for speedups), but more review is needed, please.
Thanks, Andrew. I appreciate your help here. I'll address the latest review comments in the next version. Thanks, Zhe

