On Wed, Jun 10, 2026 at 12:19:20 -0700, [email protected] wrote: > On Tue, Jun 09, 2026 at 08:01:32PM +0800, Li Zhe wrote: > > That said, I see your layering point. If arch/x86/include/asm/string.h > > is the preferred place for the arch-visible wrapper, I can move the > > wrapper there in the next revision while keeping the x86_64-specific > > implementation details in string_64.h. > > No, 64-bit only's fine. We don't put any new features into 32-bit already > anyway but that wasn't clear from the commit message what your goal is.
Thanks, that makes sense. > > Thinking about it more, I agree that this is hard to justify for a > > generic helper. For this series, what really matters is that the > > struct page copies in patch 8 can use the existing x86 > > memcpy_flushcache() fastpaths where that is beneficial; I do not need > > patch 6 to impose extra selection policy on unrelated callers. > > What I am asking is, you need to show numbers why those helpers exist. > > Your 0th message is talking about measuring this in VMs. If this workload is > not VM-specific, then those numbers don't matter. They're just handwaving. > > So I'd need a good justification why we need the changes before we go any > further. Understood. I do not currently have access to physical PMEM hardware on my side, so the numbers I posted so far were all from a VM-based setup. I agree that this is not sufficient justification for introducing the helper / x86 nt part of the series. For the next resend, I will first split out and resend the mm-only subset, and drop the helper / x86 nt part for now. If anyone has access to real PMEM hardware and is willing to test whether that part shows a measurable benefit there, I would greatly appreciate it. Otherwise, is there a preferred way to justify or validate that part without physical PMEM measurements, or is the right approach simply to keep it out of the series until such data is available? Thanks, Zhe

