Jens-Heiner Rechtien wrote:
Hi,
I've done some additional very simple minded measurements to estimate
the effects of inling the reference counters and the potential
overhead for checking if we are on a SMP system. I got the following
numbers:
I: inlining
NOI: no-inlining
SMPC: SMP check
NOSMPC: no SMP check
Times are in seconds.
NOI/NOSMPC, I/NOSMPC, NOI/SMPC, I/SMPC
P-IV 1800 (single) 7.634 6.892 1.796 0.784
Xeon 3.06GHz (multi) 6.50 4.07 6.67 4.11
Conclusions: Checking for SMP costs about 1% (4.11s vs. 4.07s)
additionally on multi-processor machines, and yields about 880% speed
improvement on older non-HT/non-multiprocessor systems. Inlining is
significant, too. The effect of inlining dwarfs the penalty for
checking for SMP on modern multi-processor systems.
Great result for older machines, which is, I assume, where any
improvement is needed most. I'm curious as to why the call overhead is
such a large proportion of the Xeon result (37%). Were the total number
of calls to incrementInterlockedCount() the same for both P-IV and Xeon?
It looks as though the Xeon either doesn't lock the buss in this test,
or it's a lot more efficient with it. I think you mentioned earlier that
this was possible.
Ross
The measurements were done with the simple benchmark attached, they
are of course no substitute for doing some real profiling with the
office code.
Heiner
------------------------------------------------------------------------
CFLAGS= -I. -fPIC -O2 -Wall -DINLINE -DCHECKSMP
#CFLAGS= -I. -fPIC -O2 -Wall -DINLINE
#CFLAGS= -I. -fPIC -O2 -Wall -DCHECKSMP
#CFLAGS= -I. -fPIC -O2 -Wall
intrlock: intrlock.o libsal.so
$(CC) $(CFLAGS) -o intrlock $< -L. -lsal
libsal.so: sal.o
$(CC) -shared -o libsal.so $<
clean:
rm *.o libsal.so intrlock
all: intrlock libsal.so
------------------------------------------------------------------------
extern int is_smp;
#if defined(INLINE)
#if defined(CHECKSMP)
inline int incrementInterlockedCount(int *p) {
int n;
if ( is_smp ) {
__asm__ __volatile__ (
"movl $1, %0\n\t"
"lock\n\t"
"xaddl %0, %2\n\t"
"incl %0" :
"=&r" (n), "=m" (*p) :
"m" (*p) :
"memory");
}
else {
__asm__ __volatile__ (
"movl $1, %0\n\t"
"xaddl %0, %2\n\t"
"incl %0" :
"=&r" (n), "=m" (*p) :
"m" (*p) :
"memory");
}
return n;
}
#else /* !CHECKSMP */
inline int incrementInterlockedCount(int *p) {
int n;
__asm__ __volatile__ (
"movl $1, %0\n\t"
"lock\n\t"
"xaddl %0, %2\n\t"
"incl %0" :
"=&r" (n), "=m" (*p) :
"m" (*p) :
"memory");
return n;
}
#endif /* !CHECKSMP */
#else /* INLINE */
int incrementInterlockedCount(int *p);
#endif /* INLINE */
------------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]