Wow, Sergey is always a bit faster than I am :-) I would include that patch if you remove the MOVAPS or test if the feature is really available. Maybe the simple generic memcopy you posted first is not much slower?
- Dietmar On Sun, 2002-05-26 at 03:22, Sergey Chaban wrote: > Hello! > > > However, how often does this happen? > > Not very often, most certainly :-) > As far as I can tell, the opcode is currently used by Managed VC++ to inline memcpy, > if certain optimizations were enabled or if compiler was explicitly instructed to do >so > with #pragma intrinsic(memcpy). > > I think that another use for cpblk is dynamic code generated at runtime (with >Reflection.Emit), > perhaps when size is already known (something similar to self-modifying code often >used in the old days). > > > > When you have small data blocks to be moved (but the size isn't known > > at compile time) it's generally best to "rep movsb" without any additional > > logic. When you have larger blocks, it really pays off to optimize for > > things like DWORD/QWORD alignment, cache prefetching (available in most > > modern CPU architectures). Ideally you have specialized copy/move routines > > I totally agreed :-) > All in all, I think it's perfectly correct to implement cpblk with memmove, > but I think that it would be wrong to make any assumptions about its behaviour > (with regard to overlapping blocks), and write code based on these assumptions. > > Also not all modern CPUs are x86s ;-) > > I put together some tests and this patch with some optimizations for size=const: > http://mono.eurosoft.od.ua/files/x86.brg.cpblk.diff > > Some sample code: > http://mono.eurosoft.od.ua/files/CpblkTest.il > http://mono.eurosoft.od.ua/files/BulkCpy.il > > These tests are rather synthetic, unfortunately it's currently impossible to run > VC++ generated executables under Mono - I'd code something more realistic :-) > The first test is just moving XYZ float vectors around (size=12, in this case >performance > increase is quite noticeable). The second just copies blocks of various sizes. > > The patch is quick and dirty, for different sizes it emits code optimized for >different CPUs :-) > Moreover it uses MOVAPS instructions to copy blocks larger than 1K without checking > if SSE is actually available, so second test will crash on CPUs without SSE. > It uses FPU to move blocks of certain sizes which is faster on older Pentiums/486 >but slow on P6+. > This is just to demonstrate/test CPU-specific optimizations for cpblk. _______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
