On 26/03/2009 20:08, Don wrote:
BTW: I tested the memcpy() code provided in AMD's 1992 optimisation
manual, and in Intel's 2007 manual. Only one of them actually gave any
benefit when run on a 2008 Intel Core2 -- which was it? (Hint: it wasn't
Intel!)
I've noticed that AMD's docs are usually greatly superior to Intels, but
this time the difference is unbelievable.

Don, have you seen Agner Fog's memcpy() and memmove() implementations included with the most recent versions of his manuals? In the unaligned case they read two XMM words and shift/combine them into the target alignment, so all loads and stores are aligned. Pretty cool.

He says (modestly):

; This method is 2 - 6 times faster than the implementations in the
; standard C libraries (MS, Gnu) when src or dest are misaligned.
; When src and dest are aligned by 16 (relative to each other) then this
; function is only slightly faster than the best standard libraries.

Reply via email to