Hi,
In the following sample code
#define N 4
float a[N] __attribute__ ((aligned (16)));
float b[N] __attribute__ ((aligned (16)));
float c[N] __attribute__ ((aligned (16)));
double d[N] __attribute__ ((aligned (16)));
void f(float *pa, float *pb);
int main(){
int i;
f(a,b);
for(i=0; i<N; i++){
c[i] = b[i] + a[i] + d[i];
}
return 0;
}
In the above code, a, b, c is float and d is double. Thus, it has to
convert float to double after b[i]+a[i].
I use gcc-4.6 with flag "gcc-4.6 -O3 -msse2 -msse3 -mfpmath=sse
-ftree-vectorize -mmmx -msse4.1 -msse -S". As expected, It can
generate lots of SSE instructions. As following:
.....
movaps b(%rip), %xmm0
xorl %eax, %eax
xorps %xmm1, %xmm1
addps a(%rip), %xmm0
movhlps %xmm0, %xmm1
cvtps2pd %xmm0, %xmm2
cvtps2pd %xmm1, %xmm1
addpd d(%rip), %xmm2
addpd d+16(%rip), %xmm1
cvtpd2ps %xmm2, %xmm0
cvtpd2ps %xmm1, %xmm1
movlhps %xmm1, %xmm0
movaps %xmm0, c(%rip)
addq $8, %rsp
.....
However, I use arm-linux-gnueabihf-gcc-4.6 with flag "-static
-mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize
-mvectorize-with-neon-quad -ftree-slp-vectorize -march=armv7-a
-mtune=cortex-a15 -O3 -Ofast -S". It generate all scalar instructions
without any NEON instructions.
Although NEON doesn't support double precision floating point, it
still can generate "b add_neon a" first. Then, using scalar
instructions to do other computation.
Are there any reasons such that arm-linux-gnueabihf-gcc-4.6 doesn't
generate binary contain NEON?
Any help appreciated.
Thank you very much,
Sheng Yu