At Thu, 18 Jun 2009 23:42:57 +0200,
José Luis García Pallero wrote:
> No loop unrolling: 0.005 s
> Loop unrolling: 0.6 s
>
> for(i=0;i<n;i++)
> {
> a = i*i+i;
> }
> }
I think the program below is probably more realistic for this
case. Given the huge difference between the two results I suspect that
the compiler is able to overoptimise the simple case above. Maybe you
could compare this or the actual function.
#include <stdlib.h>
#include <time.h>
#include <stdio.h>
int
main (int argc, char *argv[])
{
int n = 0, i = 0, j, m;
double *a, *x;
double t0, t1, t2;
double A = 3, B = 2;
n = atoi (argv[1]);
m = atoi (argv[1]);
a = malloc (sizeof (double) * n);
x = malloc (sizeof (double) * n);
t0 = clock ();
{
for (j = 0; j < m; j++)
for (i = 0; i < n; i++)
{
a[i] = A * x[i] + B;
}
}
t1 = clock ();
{
for (j = 0; j < m; j++)
for (i = 0; i < n; i += 4)
{
a[i] = A * x[i] + B;
a[i + 1] = A * x[i + 1] + B;
a[i + 2] = A * x[i + 2] + B;
a[i + 3] = A * x[i + 3] + B;
}
}
t2 = clock ();
printf ("operations = %g\n", (double) (n * m));
printf ("plain loop = %g\n", t1 - t0);
printf ("fancy loop = %g\n", t2 - t1);
return 0;
}
_______________________________________________
Help-gsl mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-gsl