=?iso-8859-1?Q?Rog writes: > According to my tests, cacheable_memcpy is approximately 40% > faster than the original glibc version, which is quite an > improvement: with my tests, the glibc version took approx. 69s > to run, while the cacheable_memcpy took only 42s (repeated > many times to avoid noise errors).
Wait a second... start with a high-level view. Why is memcpy being used so much? What is it being called on? (how big is it, is either address 8-byte or 32-byte aligned, is the source cached/cacheable, is the destination cached/cachable, etc.) Maybe you'd better profile this a bit. Oh well. Some MPC7xx "code" for you to look at... Assumptions: 1. huge copies 2. 32-byte alignment (both src & dst) 3. can be cached (both src & dst) I didn't check to see which FPU registers are available for a leaf function to abuse. (I forget) This obviously isn't tested. It might be good to unroll the loop a bit. Note that I discard both src and dst. I'm expecting them to be a megabyte or so, which would just blow away the cache for no good reason. This way, only one "way" of the n-way associative cache gets lost. Play with the ordering a bit. ////////////////////////////////////////////////////////////////// #define dcba dcbz /* dcba being removed from Power/PowerPC? */ #define dcbi dcbf /* dcbi is a supervisor-level instruction */ #define dst r3 #define src r4 #define num r5 #define eight r8 /* must load a constant 8 into r8 */ BLAH, BLAH... dcbt eight,src /* prefetch the next cache line */ loop_top: dcba eight,dst /* allocate a cache line */ lfd f11,8,(src) lfd f12,16,(src) lfd f13,24,(src) lfdu f14,32,(src) dcbi r0,src /* would like to discard the src data */ dcbt eight,src /* prefetch the next cache line */ stfd f11,8,(dst) stfd f12,16,(dst) stfd f13,24,(dst) stfdu f14,32,(dst) dcbf r0,dst /* write back if needed, then invalidate */ bdnz loop_top BLAH, BLAH... ////////////////////////////////////////////////////////////////// -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

