On Thursday, 29 December 2011 at 14:44:45 UTC, Don wrote:
http://www.danielvik.com/2010/02/fast-memcpy-in-c.html . It
doesn't even
use inline assembler or compiler intrinsics.
Note that the memcpy described there is _far_ from optimal.
Memcpy is all about cache effciency. DMD translates memcpy to
the single instruction "rep movsd" which you'd think would be
optimal, but you can actually beat it by a factor of four or
more for long lengths.
I've never seen DMD emit rep movsd. Does rep movsd even make
sense when the memory areas do not have the same alignment?
memcpy in snn.lib has a rep movsd instruction, but there's lots
of other code (including what looks like Duff's device).