> -----Original Message----- > From: Stefan Fuhrmann [mailto:stefanfuhrm...@alice-dsl.de] > Sent: dinsdag 27 april 2010 1:10 > To: Bert Huijben; dev@subversion.apache.org > Subject: Re: [PATCH] Saving a few cycles, part 1/2 > > Bert Huijben wrote: > > > >> -----Original Message----- > >> From: Stefan Fuhrmann [mailto:stefanfuhrm...@alice-dsl.de] > >> In this patch, I eliminated calls to memcpy for small copies as they are > >> particularly expensive in the MS CRT. > >> > > > > Which CRT did you use for these measurements? (2005, 2008, 2010, Debug > vs > > Release and DLL vs Static?). Which compiler version? (Standard/Express or > > Professional+). (I assume you use the normal Subversion build using .sln > > files and not the TortoiseSVN scripts? Did you use the shared library builds > > or a static build)? > > > VSTS2008 Developer Edition. Release build (am I an Amateur?!) > TSVN build scripts which set /Ox (global opt, intrinsics, omit frame > pointers, ...) > > Did you try enabling the intrinsincs for this method instead of using a > > handcoded copy? > > > <mode="eductional prick"> > Yes, but it does not help in this case: memset will use intrinsics > only for short (<48 bytes on x86) _fixed-size_ buffers. memcpy > will use intrinsics for _fixed-size_ buffers only, but seemingly with > no size limit.
But did you try a non-shared library build. If you use the C runtime as a shared library things like using fastcall instead of __cdecl or full program optimization don't matter as you don't change msvcr90.dll (or a later version) in your build. The overhead of calling a function in a DLL is probably bigger than the thing you are trying to accomplish by handcoding your memcpy(). In your first mail you said " In this patch, I eliminated calls to memcpy for small copies as they are particularly expensive in the MS CRT." Did you compare it to other toolchains? And did you compare it to a completely static build without referencing to msvcr90.dll? Where these functions on other toolchains compiled into the binary or also in an external dll? When comparing 7 byte buffer copies, things like doing an indirect call for a library function have a much bigger impact than shaving off a few assembler instructions of the loop itself, so maybe just passing /MT instead of /MD to the compiler makes the same difference. (It will certainly help on the full optimization) Looking at the TortoiseSVN build and my local set of binaries I see that it uses MSVCR90.DLL from most of its libraries, and it at least uses memcpy() from its dll in a few code paths. Bert