> Anyone got a good reference on the cycle cost of function
> call overhead (compared by CPU family would be nice)?  I try 
> to figure it out myself, but am afraid that gcc is out-thinking me.

O.K. - I did a small test:

#include <stdio.h>
#include <sys/time.h>

int bla=0;

void inc_bla(void) {
        bla++;
}

#define NUM_REPS 1000000000

main()
{
        struct timeval tv,tv2;
        int x;
 
        gettimeofday(&tv,NULL);
        for(x=0;x<NUM_REPS;x++) {
                inc_bla();
        }
        gettimeofday(&tv2,NULL);
        tv2.tv_sec -=tv.tv_sec; 
        tv2.tv_usec-=tv.tv_usec;
        if (tv2.tv_usec<0) {tv2.tv_usec+=1000000;tv2.tv_sec--; }
        printf("%8d: %8d.%06d\n",bla,tv2.tv_sec,tv2.tv_usec);   
        gettimeofday(&tv,NULL);
        for(x=0;x<NUM_REPS;x++) {
                bla++;
        }
        gettimeofday(&tv2,NULL);
        tv2.tv_sec -=tv.tv_sec; 
        tv2.tv_usec-=tv.tv_usec;
        if (tv2.tv_usec<0) {tv2.tv_usec+=1000000;tv2.tv_sec--; }
        printf("%8d: %8d.%06d\n",bla,tv2.tv_sec,tv2.tv_usec);   
}

1000000000:       14.975934
2000000000:        8.860147

That seems to say that a normal function call uses 6.115787*10^-9 seconds

As my P-III runs at 805.657 MHz according to cpuinfo, that would give
4.92 cycles - probably 5 cycles ...

Running gcc with -S yields :

inc_bla:
        pushl   %ebp
        movl    %esp, %ebp
        incl    bla
        popl    %ebp
        ret

The loops yield identical code except for:

.L7:
        call    inc_bla
vs.
.L12:
        incl    bla

which could only cause a minor extra penalty due to alignment issues or
similar.

Indirect calling might be a bit more expensive, though.

CU, Andy

-- 
= Andreas Beck                    |  Email :  <[EMAIL PROTECTED]>             =

Reply via email to