On Sat, 4 Jul 1998, Tuukka Toivonen wrote:

>I have made some functions in assembly and want to call them
>from C (or vice versa). I have already got it to work, however..

I suggest you to learn and use the gcc inline asm. The way gcc implements
inline gcc is so far the best. It allow gcc to optimize out everything as
best.

>What registers I should save and what registers are not needed
>to save? I have saved everything with 'pushad' but that's
>inefficient.

It depens on the architecture. I don' t know exactly the calling
convention of x86.

>>   GNU CC is normally configured to use the same function calling
>>convention normally in use on the target system. 
>
>But GNU CC _defines_ what is the normally used calling convention.
>So this is a recursive definition...

You only need to take a look at the assembler generated from a C program
compiled by gcc (objdump is your friend). At least this is my way to
discover the calling convention (secure and pratical ;-).

>I'm using Linux 2.0.34, GCC 2.7.2 and x86 (Pentium).

For x86 all function parameters are passed across the stack (little endian
so the first push refer the last argument) while the retval is returned in
eax if the function returns 32bit. If the function return a large struct
it should be returned the pointer to the struct. I don' t know the
details, if somebody would know the details I could be interested too
(just for curiosity). Sure you have not to preserve eax inside the
function call, since it has to contain the retval...

BTW, Glynn some mail ago you said that the x86 would _not_ perform better
a lot passing arguments to the function through registers, and this is not
true since for example the eax register has not to be preserved at all. It
would be nice to pass the last parameter of the function call in the eax
register and the other parameters across the stack as usual. I think it
would help a lot in performance. I' ll try to discover the improvement. 

...<Some time passed>...

Here the example:

static unsigned int TIME64(void)
{
  unsigned int dummy,low;
  __asm__("rdtsc"
          :"=a" (low),
          "=d" (dummy));
  return low;
}

static int p_normal(unsigned int t)
{
  return ++t;
}

static int p_fast (unsigned int t) __attribute__ ((regparm(1)));
static int p_fast (unsigned int t)
{
  return ++t;
}

#define CYCLES 100

main()
{
  int i, k, p = 0;
  unsigned int fast_start, fast_end, start, end, t1, t2;

  for (k=0; k<CYCLES; k++)
    {
      fast_start = TIME64();
      for (i=0; i<CYCLES; i++)
          p = p_fast(p);
      fast_end = TIME64();
    }

  for (k=0; k<CYCLES; k++)
    {
      start = TIME64();
      for (i=0; i<CYCLES; i++)
          p = p_normal(p);
      end = TIME64();
    }

  for (i=0; i<CYCLES; i++)
    {
      t1 = TIME64();
      t2 = TIME64();
    }

  printf ("Fast latency: %d, normal latency %d\n", fast_end-fast_start-(t2-t1),
          end-start-(t2-t1));
}

andrea@dragon:/tmp$ gcc regparm-pentium.c -O2
andrea@dragon:/tmp$ ./a.out
Fast latency: 1007, normal latency 1307

Using eax for passing the first argument we get a bit improvement. Sure
this example is the best to get nice numbers from regparm(1), but the
improvement exists in every case (except for `function(void)' of course).
So really only history reasons are sucking performance on x86...

Andrea[s] Arcangeli

Reply via email to