Re: Replacing C's memcpy with a D implementation

Walter Bright via Digitalmars-d Sun, 10 Jun 2018 15:26:26 -0700

On 6/10/2018 11:16 AM, David Nadlinger wrote:

Because of the large amounts of noise, the only conclusion one can draw fromthis is that memcpyD is the slowest,


Probably because it does a memory allocation.

followed by the ASM implementation.

The CPU makers abandoned optimizing the REP instructions decades ago, and justleft the clunky implementations there for backwards compatibility.

In fact, memcpyC and memcpyNaive produce exactly the same machine code (withoutbounds checking), as LLVM recognizes the loop and lowers it into a memcpy.memcpyDstdAlg instead gets turned into a vectorized loop, for reasons I didn'tinvestigate any further.

This amply illustrates my other point that looking at the assembler generated iscrucial to understanding what's happening.

Re: Replacing C's memcpy with a D implementation

Reply via email to