<snip>
> > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > > Sent: Monday, 25 July 2022 03.18 > > > > [...] > > > > Yes, x86 needs 16B alignment for NT load/stores But that's supposed > > to be arch > > > specific limitation, that we probably want to hide, no? > > Correct. However, optional hints for optimization purposes will be available. > And it is up to the architecture specific implementation to make the best use > of these hints, or just ignore them. > > > > Inside the function can check alignment of both src and dst and > > decide should it > > > use NT load/store instructions or just do normal copy. > > IMO, the normal copy should not be done by this API under any > > conditions. Why not let the application call memcpy/rte_memcpy when > > the NT copy is not applicable? It helps the programmer to understand > > and debug the issues much easier. > > Yes, the programmer must choose between normal memcpy() and non- > temporal rte_memcpy_nt(). I am offering new functions, not modifying > memcpy() or rte_memcpy(). > > And rte_memcpy_nt() will silently fall back to normal memcpy() if non- > temporal copying is unavailable, e.g. on POWER and RISC-V architectures, > which don't have NT load/store instructions. I am talking about a scenario where the application is being ported between architectures. Not everyone knows about the capabilities of the architecture. It is better to indicate upfront (ex: compilation failures) that a certain feature is not supported on the target architecture rather than the user having to discover through painful debugging.