On Thursday, 29 December 2011 at 14:44:45 UTC, Don wrote:
http://www.danielvik.com/2010/02/fast-memcpy-in-c.html . It doesn't even
use inline assembler or compiler intrinsics.

Note that the memcpy described there is _far_ from optimal. Memcpy is all about cache effciency. DMD translates memcpy to the single instruction "rep movsd" which you'd think would be optimal, but you can actually beat it by a factor of four or more for long lengths.

I've never seen DMD emit rep movsd. Does rep movsd even make sense when the memory areas do not have the same alignment? memcpy in snn.lib has a rep movsd instruction, but there's lots of other code (including what looks like Duff's device).

Reply via email to