Hello All:
I would like to propose the Unrolling factor based on Data reuse between
different iterations. This combines the data
reuse of different iterations into single iterations. There is a use of
MaxFactor which decides on the calculation of unroll
factor based on Data reuse.The MaxFactor is calculated based on
(MaxInsts/LoopBodyInsts). The MaxInsts decides on
the number of instruction that doesn't degrades on the instruction cache. The
calculation of the MaxInsts also considers
the def and use distance for the max insts that does not degrades the
instruction cache which leads to increase in the Live
range width and multiplied with the MaxInsts calculated based on Instruction
cache.
The following example from Livermore Loops benchmarks.
The data reuse from the previous iteration to current iteration makes the data
reuse. Unrolling the Loop in Fig(1) can
Reuse the data and thus increasing the performance. The unrolled loop is
unrolled 3 times given in Fig(2) based on the
algorithm for Calculation of unrolling factor used as given in Fig(3).
For ( I = 1; I < size; i++ )
{
X[i] = z[i] * ( y[i] - x[i-1]);
}
Fig(1).
For ( I = 1; I < size ; i+=3)
{
X[i] = z[i] * ( y[i] - x[i-1]);
X[i+1] = z[i] * ( y[i] - x[i]);
X[i] = z[i] * ( y[i] - x[i+1]]);
}
Fig(2).
Algorithm for unroll factor based on data reuse.
If ( data_reuse == true)
{
Unroll_factor = Reuse_distance +1;
If(Unroll_factor < MaxFactor)
Return unroll_factor;;
Else{
Unroll_factor = MaxFactor - unroll_factor.
If( unroll_factor < MaxDistance)
Return unroll_factor;
}
}
Fig ( 3).
In the example given above in Fig(1) the data reuse distance and the MaxFactor
calculated based on the maximum number of
insts that don't degrades the instructions caches and doesn't exceed the
maximum limit on def and use distance that increases
the width of Live range. The above is considered and the Loop is 3 times.
Thoughts Please ?
Thanks & Regards
Ajit