Re: Replacing C's memcpy with a D implementation

Temtaime via Digitalmars-d Sun, 10 Jun 2018 15:41:35 -0700

On Sunday, 10 June 2018 at 22:23:08 UTC, Walter Bright wrote:

On 6/10/2018 11:16 AM, David Nadlinger wrote:
Because of the large amounts of noise, the only conclusion onecan draw from this is that memcpyD is the slowest,
Probably because it does a memory allocation.
followed by the ASM implementation.
The CPU makers abandoned optimizing the REP instructionsdecades ago, and just left the clunky implementations there forbackwards compatibility.
In fact, memcpyC and memcpyNaive produce exactly the samemachine code (without bounds checking), as LLVM recognizes theloop and lowers it into a memcpy. memcpyDstdAlg instead getsturned into a vectorized loop, for reasons I didn'tinvestigate any further.
This amply illustrates my other point that looking at theassembler generated is crucial to understanding what'shappening.

On some cpu architectures(for example intel atoms) rep movsb isthe fatest memcpy.

Re: Replacing C's memcpy with a D implementation

Reply via email to