https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Known to work| |7.2.1 Version|tree-ssa |8.0 Keywords| |missed-optimization Last reconfirmed| |2018-01-05 CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Summary|Missing optimization: |[8 Regression] Missing |useless instructions should |optimization: useless |be dropped |instructions should be | |dropped Target Milestone|--- |8.0 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Works fine with GCC 7. I suppose unroller limits hit and/or we're just lucky that GCC 7 doesn't vectorize the reduction loop ... Trunk has t.C:16:20: note: loop vectorized t.C:21:10: note: basic block vectorized resulting in <bb 2> [local count: 178992762]: MEM[(int *)&arr] = { 5, 4, 3, 2 }; t_2 = arr[0]; _65 = arr[1]; _46 = MEM[(int *)&arr + 8B]; MEM[(int *)&arr] = _46; arr[2] = 1; arr[3] = t_2; vect__2.5_38 = MEM[(int *)&arr]; vect_sum_21.8_30 = VEC_PERM_EXPR <vect__2.5_38, { 0, 0, 0, 0 }, { 2, 3, 4, 5 }>; vect_sum_21.8_15 = vect_sum_21.8_30 + vect__2.5_38; vect_sum_21.8_59 = VEC_PERM_EXPR <vect_sum_21.8_15, { 0, 0, 0, 0 }, { 1, 2, 3, 4 }>; vect_sum_21.8_60 = vect_sum_21.8_15 + vect_sum_21.8_59; stmp_sum_21.7_61 = BIT_FIELD_REF <vect_sum_21.8_60, 32, 0>; sum_27 = stmp_sum_21.7_61 + _65; _23 = (unsigned int) sum_27; arr ={v} {CLOBBER}; return _23; while GCC 7 simply unrolls the loop. DOM is not able to simplify the vector load from MEM[(int *)&arr] but the scalar loads from the unrolled variant.