<snip>

> 
> > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> > Sent: Monday, 25 July 2022 03.18
> >
> 
> [...]
> 
> > > Yes, x86 needs 16B alignment for NT load/stores But that's supposed
> > to be arch
> > > specific limitation, that we probably want to hide, no?
> 
> Correct. However, optional hints for optimization purposes will be available.
> And it is up to the architecture specific implementation to make the best use
> of these hints, or just ignore them.
> 
> > > Inside the function can check alignment of both src and dst and
> > decide should it
> > > use NT load/store instructions or just do normal copy.
> > IMO, the normal copy should not be done by this API under any
> > conditions. Why not let the application call memcpy/rte_memcpy when
> > the NT copy is not applicable? It helps the programmer to understand
> > and debug the issues much easier.
> 
> Yes, the programmer must choose between normal memcpy() and non-
> temporal rte_memcpy_nt(). I am offering new functions, not modifying
> memcpy() or rte_memcpy().
> 
> And rte_memcpy_nt() will silently fall back to normal memcpy() if non-
> temporal copying is unavailable, e.g. on POWER and RISC-V architectures,
> which don't have NT load/store instructions.
I am talking about a scenario where the application is being ported between 
architectures. Not everyone knows about the capabilities of the architecture. 
It is better to indicate upfront (ex: compilation failures) that a certain 
feature is not supported on the target architecture rather than the user having 
to discover through painful debugging.

Reply via email to