https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89101
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |NEW Known to work| |9.0 Version|unknown |8.2.0 Target Milestone|--- |9.0 Known to fail| |8.2.0 --- Comment #3 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Gael Guennebaud from comment #2) > Indeed, it fails to remove the dup only if the coefficient is used multiple > times as in the following reduced exemple: (https://godbolt.org/z/hmSaE0) > > > #include <arm_neon.h> > > void foo(const float* a, const float * b, float * c, int n) { > float32x4_t c0, c1, c2, c3; > c0 = vld1q_f32(c+0*4); > c1 = vld1q_f32(c+1*4); > for(int k=0; k<n; k++) > { > float32x4_t a0 = vld1q_f32(a+0*4+k*4); > float32x4_t b0 = vld1q_f32(b+k*4); > c0 = vfmaq_laneq_f32(c0, a0, b0, 0); > c1 = vfmaq_laneq_f32(c1, a0, b0, 0); > } > vst1q_f32(c+0*4, c0); > vst1q_f32(c+1*4, c1); > } > > > I tested with gcc 7 and 8. Confirmed for GCC8, fixed on trunk. I tried the above example with up to 4 uses and it always generates the expected code on trunk. So this is fixed for GCC9, however it seems unlikely the fix (multi-use support in Combine) could be backported.