Re: Replacing C's memcpy with a D implementation

David Nadlinger via Digitalmars-d Sun, 10 Jun 2018 16:41:27 -0700

On Sunday, 10 June 2018 at 22:23:08 UTC, Walter Bright wrote:

On 6/10/2018 11:16 AM, David Nadlinger wrote:
Because of the large amounts of noise, the only conclusion onecan draw from this is that memcpyD is the slowest,
Probably because it does a memory allocation.


Of course; that was already pointed out earlier in the thread.

The CPU makers abandoned optimizing the REP instructionsdecades ago, and just left the clunky implementations there forbackwards compatibility.

That's not entirely true. Intel started optimising some of theREP string instructions again on Ivy Bridge and above. There is aCPUID bit to indicate that (ERMS?); I'm sure the OptimizationManual has further details. From what I remember, `rep movsb` issupposed to beat an AVX loop on most recent Intel µarchs if thedestination is aligned and the data is longer than a few cachelines. I've never measured that myself, though.


 — David

Re: Replacing C's memcpy with a D implementation

Reply via email to