>> I also found from timing tests using hand-optimised assembler that calls
>> to the Win32 API Interlocked routines appeared to be optimised when the
>> code is compiled by MSVC, but not GCC (say). It was as though MSVC was
>> emitting optimised assembler on the fly instead of calling the routines
>> in Kernel32.dll. My timings showed that the standard Interlocked routine
>> calls compiled with MSVC were as fast or faster than my inlined
>> assembler without the LOCK prefix. The interlocked routines are used as
>> the basis for the mutex operations in pthreads-win32, and using the
>> assembler versions, I was able to cut the time for some of the pthreads-
>> win32 test applications involving saturated POSIX reader-writer lock
>> calls to nearly 1/3 for the gcc compiled versions, and match the times
>> produced by the MSVC compiled code.
> 
> Now that's interesting! Did you disassemble what MSVC emits instead of 
> calling the interlocked routines. How do they achieve atomic operations 
> without the lock prefix to xadd, xchg or cmpxchg instructions?

However, compiling on my Pentium-M lets VS8 always take the kernel32
routines (tried all /O? options, taking decls from winbase.h): no
intrinsics [1].
Though including the intrinsic versions directly works, the code is inlined:

#include <intrin.h>

void foo(long * p)
{
    _InterlockedIncrement(p);
}

[d:\]cl -c /Ox t.cxx ^ dumpbin /DISASM t.obj
[...]
  00000004: B9 01 00 00 00     mov         ecx,1
  00000009: F0 0F C1 08        lock xadd   dword ptr [eax],ecx

regards,
-Daniel

[1] http://msdn2.microsoft.com/en-us/library/2ddez55b(VS.80).aspx

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to