Re: [dev] x86 osl/interlck.h performance

Ross Johnson Fri, 12 May 2006 19:13:36 -0700

Jens-Heiner Rechtien wrote:

Hi,
I've done some additional very simple minded measurements to estimatethe effects of inling the reference counters and the potentialoverhead for checking if we are on a SMP system. I got the followingnumbers:
I:      inlining
NOI:    no-inlining
SMPC:   SMP check
NOSMPC: no SMP check

Times are in seconds.

                    NOI/NOSMPC, I/NOSMPC, NOI/SMPC, I/SMPC
P-IV 1800 (single)    7.634       6.892     1.796   0.784
Xeon 3.06GHz (multi)  6.50        4.07      6.67    4.11
Conclusions: Checking for SMP costs about 1% (4.11s vs. 4.07s)additionally on multi-processor machines, and yields about 880% speedimprovement on older non-HT/non-multiprocessor systems. Inlining issignificant, too. The effect of inlining dwarfs the penalty forchecking for SMP on modern multi-processor systems.

Great result for older machines, which is, I assume, where anyimprovement is needed most. I'm curious as to why the call overhead issuch a large proportion of the Xeon result (37%). Were the total numberof calls to incrementInterlockedCount() the same for both P-IV and Xeon?It looks as though the Xeon either doesn't lock the buss in this test,or it's a lot more efficient with it. I think you mentioned earlier thatthis was possible.


Ross

The measurements were done with the simple benchmark attached, theyare of course no substitute for doing some real profiling with theoffice code.


Heiner

------------------------------------------------------------------------

CFLAGS= -I. -fPIC -O2 -Wall -DINLINE -DCHECKSMP
#CFLAGS= -I. -fPIC -O2 -Wall -DINLINE
#CFLAGS= -I. -fPIC -O2 -Wall -DCHECKSMP
#CFLAGS= -I. -fPIC -O2 -Wall

intrlock: intrlock.o libsal.so
        $(CC) $(CFLAGS) -o intrlock $< -L. -lsal

libsal.so: sal.o
        $(CC) -shared -o libsal.so $<


clean:
        rm *.o libsal.so intrlock
        
all: intrlock libsal.so

------------------------------------------------------------------------

extern int is_smp;

#if defined(INLINE)
#if defined(CHECKSMP)
inline int incrementInterlockedCount(int *p) {
   int n;
   if ( is_smp ) {
       __asm__ __volatile__ (
           "movl $1, %0\n\t"
           "lock\n\t"
           "xaddl %0, %2\n\t"
           "incl %0" :
           "=&r" (n), "=m" (*p) :
           "m" (*p) :
           "memory");
   }
   else {
       __asm__ __volatile__ (
           "movl $1, %0\n\t"
           "xaddl %0, %2\n\t"
           "incl %0" :
           "=&r" (n), "=m" (*p) :
           "m" (*p) :
           "memory");
   }
   return n;
}
#else /* !CHECKSMP */
inline int incrementInterlockedCount(int *p) {
   int n;
   __asm__ __volatile__ (
       "movl $1, %0\n\t"
       "lock\n\t"
       "xaddl %0, %2\n\t"
       "incl %0" :
       "=&r" (n), "=m" (*p) :
       "m" (*p) :
       "memory");
   return n;
}
#endif /* !CHECKSMP */
#else  /* INLINE */
int incrementInterlockedCount(int *p);
#endif  /* INLINE */

------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dev] x86 osl/interlck.h performance

Reply via email to