Re: Performance of floating point instructions

Siarhei Siamashka Wed, 10 Mar 2010 11:54:51 -0800

On Wednesday 10 March 2010, Laurent Desnogues wrote:
> On Wed, Mar 10, 2010 at 7:29 PM, Alberto Mardegan
> > So, it seems that there's a huge improvements when switching from doubles
> > to floats; although I wonder if it's because of the FPU or just because
> > the amount of data passed around is smaller.
> > On the other hand, the improvements obtained by enabling the fast FPU
> > mode is rather small -- but that might be due to the fact that the FPU
> > operations are not a major player in this piece of code.
>
> The "fast" mode only gains 1 or 2 cycles per FP instruction.
> The FPU on Cortex-A8 is not pipelined and the fast mode
> can't change that :-)


It's probably
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344j/ch16s07s01.html
vs.
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344j/BCGEIHDJ.html

I wonder why the compiler does not use real NEON instructions with -ffast-math 
option, it should be quite useful even for scalar code.

something like:

vld1.32  {d0[0]}, [r0]
vadd.f32 d0, d0, d0
vst1.32  {d0[0]}, [r0]

instead of:

flds     s0, [r0]
fadds    s0, s0, s0
fsts     s0, [r0]

for:

*float_ptr = *float_ptr + *float_ptr;

At least NEON is pipelined and should be a lot faster on more complex code
examples where it can actually benefit from pipelining. On x86, SSE2 is used
quite nicely for floating point math.

-- 
Best regards,
Siarhei Siamashka
_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers

Re: Performance of floating point instructions

Reply via email to