On Sunday, 10 June 2018 at 12:49:31 UTC, Mike Franklin wrote:
I'm not experienced with this kind of programming, so I'm doubting these results. Have I done something wrong? Am I overlooking something?


Hi,

I've spent a lot of time optimizing memcpy. One of the result was that on Intel ICC the compiler intrinsics were unbeatable. Please make one that guarantee the usage of the corresponding backend intrinsic, for example on LLVM.

The reasoning is that small areas of memory (statically known) will be elided to a few byte moves (inlined), whereas the larger one will call the highly optimized C stdlib calls. If you use ASM instead of IR the optimization barriers and register spilling will make you probably less efficient.

Reply via email to