Hello,

I am just playing with loop unrolling on trees (for the purposes of
prefetching), and I encountered the following problem.  Consider loop

sum1 = 0;
sum2 = 0;
for (i = 0; i < n; i++)
  {
    x_1 = a[i];
    y_1 = b[i];
    sum1 += x_1 * y_1;
    sum2 += x_1 / y_1;
  }

Now after unrolling we get (with some handling for remaining iterations
of the loop)

sum1 = 0;
sum2 = 0;
for (i = 0; i < n; i+=4)
  {
    x_1 = a[i];
    y_1 = b[i];
    sum1 += x_1 * y_1;
    sum2 += x_1 / y_1;
    x_2 = a[i+1];
    y_2 = b[i+1];
    sum1 += x_2 * y_2;
    sum2 += x_2 / y_2;
    x_3 = a[i+2];
    y_3 = b[i+2];
    sum1 += x_3 * y_3;
    sum2 += x_3 / y_3;
    x_4 = a[i+3];
    y_4 = b[i+3];
    sum1 += x_4 * y_4;
    sum2 += x_4 / y_4;
  }

So far OK, but with ter, this becomes

sum1 = 0;
sum2 = 0;
for (i = 0; i < n; i+=4)
  {
    x_1 = a[i];
    y_1 = b[i];
    x_2 = a[i+1];
    y_2 = b[i+1];
    x_3 = a[i+2];
    y_3 = b[i+2];
    x_4 = a[i+3];
    y_4 = b[i+3];
    sum1 += x_1 * y_1 + x_2 * y_2 + x_3 * y_3 + x_4 * y_4;
    sum2 += x_1 / y_1 + x_2 / y_2 + x_3 / y_3 + x_4 / y_4;
  }

Now we need some 11 registers for the loop, instead of the original 5
(and the number of registers grows with the unroll factor).

This causes huge regressions (more than 60% on sixtrack from spec2000).
It might perhaps be a good idea to limit the size of expressions ter
produces, or in some other way restrict the way it increases the lives
of registers?

Zdenek

Reply via email to