Hi!
It's not trivial to achieve maximum possible performance for such a trivial task as memory block transfer. >From my experience with game programming I can tell you that it's generally best to completelly unroll copying for small data blocks of constant size, that is to use a series of (interleaved) "mov". When you have small data blocks to be moved (but the size isn't known at compile time) it's generally best to "rep movsb" without any additional logic. When you have larger blocks, it really pays off to optimize for things like DWORD/QWORD alignment, cache prefetching (available in most modern CPU architectures). Ideally you have specialized copy/move routines for different architecures (Pentium, K6, Athlon, MMX, SSE, SSE2, etc.) and just call (or emit the call to) the appropriate one. The cost of "call/ret" is not relevant for new processors. So when the size isn't known at compile time I suggest a simple compare of the block size against some threshold and either "rep movsb" or call to memmove() optimized for current processor architecture. If the size is known at compile time and it is small, just unroll the loop. If the size is above some threshold, just call memmove(). Just my $0.05 ;-) Jarek On 25 May 2002, Miguel de Icaza wrote: > > > memcpy already takes care of copying in the fastest way posible. > > > > That's right, but we still have a call, a ret, and a conditional or two ;-) > > I was going to say exactly that ;-) > > > By inlining we can get rid of these things (especially if size is known up-front). > > Moreover, due to JIT's dynamic nature it's possible to generate faster code at >run-time. > > For example, the following (generic) memcpy is faster on pre-Pentium x86s (Intel >syntax): > > mov esi, $src > > mov ecx, $size > > mov edi, $dest > > shr ecx,1 > > rep movsw > > adc cl,cl > > rep movsb > > > > For const size==1 we could just mov al, [src]; mov [dest],al > > etc.etc. > > BTW, MS JIT uses similar optimizations for cpblk/initblk. > > Exactly. The same logic that lives in memmove() for the data size > quantum can be inlined by the JIT engine trivially. > > However, how often does this happen? Until a couple of days ago we did > not have cpblk, so my guess is that measuring the performance impact > might not be immediately noticeable. > > I would very much like to see this at some point. > > Miguel. > > _______________________________________________ > Mono-list maillist - [EMAIL PROTECTED] > http://lists.ximian.com/mailman/listinfo/mono-list > _______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
