>> I also found from timing tests using hand-optimised assembler that calls
>> to the Win32 API Interlocked routines appeared to be optimised when the
>> code is compiled by MSVC, but not GCC (say). It was as though MSVC was
>> emitting optimised assembler on the fly instead of calling the routines
>> in Kernel32.dll. My timings showed that the standard Interlocked routine
>> calls compiled with MSVC were as fast or faster than my inlined
>> assembler without the LOCK prefix. The interlocked routines are used as
>> the basis for the mutex operations in pthreads-win32, and using the
>> assembler versions, I was able to cut the time for some of the pthreads-
>> win32 test applications involving saturated POSIX reader-writer lock
>> calls to nearly 1/3 for the gcc compiled versions, and match the times
>> produced by the MSVC compiled code.
>
> Now that's interesting! Did you disassemble what MSVC emits instead of
> calling the interlocked routines. How do they achieve atomic operations
> without the lock prefix to xadd, xchg or cmpxchg instructions?
However, compiling on my Pentium-M lets VS8 always take the kernel32
routines (tried all /O? options, taking decls from winbase.h): no
intrinsics [1].
Though including the intrinsic versions directly works, the code is inlined:
#include <intrin.h>
void foo(long * p)
{
_InterlockedIncrement(p);
}
[d:\]cl -c /Ox t.cxx ^ dumpbin /DISASM t.obj
[...]
00000004: B9 01 00 00 00 mov ecx,1
00000009: F0 0F C1 08 lock xadd dword ptr [eax],ecx
regards,
-Daniel
[1] http://msdn2.microsoft.com/en-us/library/2ddez55b(VS.80).aspx
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]