> Anyone got a good reference on the cycle cost of function
> call overhead (compared by CPU family would be nice)? I try
> to figure it out myself, but am afraid that gcc is out-thinking me.
O.K. - I did a small test:
#include <stdio.h>
#include <sys/time.h>
int bla=0;
void inc_bla(void) {
bla++;
}
#define NUM_REPS 1000000000
main()
{
struct timeval tv,tv2;
int x;
gettimeofday(&tv,NULL);
for(x=0;x<NUM_REPS;x++) {
inc_bla();
}
gettimeofday(&tv2,NULL);
tv2.tv_sec -=tv.tv_sec;
tv2.tv_usec-=tv.tv_usec;
if (tv2.tv_usec<0) {tv2.tv_usec+=1000000;tv2.tv_sec--; }
printf("%8d: %8d.%06d\n",bla,tv2.tv_sec,tv2.tv_usec);
gettimeofday(&tv,NULL);
for(x=0;x<NUM_REPS;x++) {
bla++;
}
gettimeofday(&tv2,NULL);
tv2.tv_sec -=tv.tv_sec;
tv2.tv_usec-=tv.tv_usec;
if (tv2.tv_usec<0) {tv2.tv_usec+=1000000;tv2.tv_sec--; }
printf("%8d: %8d.%06d\n",bla,tv2.tv_sec,tv2.tv_usec);
}
1000000000: 14.975934
2000000000: 8.860147
That seems to say that a normal function call uses 6.115787*10^-9 seconds
As my P-III runs at 805.657 MHz according to cpuinfo, that would give
4.92 cycles - probably 5 cycles ...
Running gcc with -S yields :
inc_bla:
pushl %ebp
movl %esp, %ebp
incl bla
popl %ebp
ret
The loops yield identical code except for:
.L7:
call inc_bla
vs.
.L12:
incl bla
which could only cause a minor extra penalty due to alignment issues or
similar.
Indirect calling might be a bit more expensive, though.
CU, Andy
--
= Andreas Beck | Email : <[EMAIL PROTECTED]> =