Hi,

I am using powerpc-eabi-gcc (3.4.1) and trying to retarget it for a
fully pipelined FPU. I have a DFA model for the FPU. I am looking at
the code produced for a simple FIR algorithm (a loop iterating over an
array, with a multiply-add operation per iteration). (I am not using
the fused-madd)

for (i = 0; i < 64; i++)
 accum = z[i] * h[i];

I have the FIR loop partially unrolled, yet am not seeing the multiply
from say, iteration i+1, overlapping with the multiply from iteration
i. From the scheduling dumps, I do see that the compiler knows that
each use of the multiply is incurring the full latency of the multiply
instead of having reduced latency by pipelining in software. The adds
are also completely linked by data flow and the compiler does not seem
to be using temporary registers to be able to exploit executing some
of the adds in parallel. Hence, each add is stalled on the previous
add.

fadds   f5,f0,f8
fadds   f4,f5,f6
fadds   f2,f4,f11
fadds   f1,f2,f3
fadds   f11,f1,f13

The register pressure is not very high. Registers f15-f31 are not used at all.

My question is, am I expecting the wrong version of GCC to be doing
this. I saw the following thread about SMS.

http://gcc.gnu.org/ml/gcc/2003-09/msg00954.html

that seems relevant. Would GCC 4.x be a better version for my
requirement? If not, any ideas would be greatly appreciated.

thanks in advance,
Vasanth

Reply via email to