https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121290
--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> --- In gimple that's <bb 10> [local count: 108459]: x_22 = a[0]; _69 = {x_22, x_22, x_22, x_22}; <bb 4> [local count: 10737416]: # ivtmp_83 = PHI <ivtmp_84(11), 0(10)> <bb 5> [local count: 1063004408]: # i_43 = PHI <i_24(12), 0(4)> # ivtmp_34 = PHI <ivtmp_33(12), 32000(4)> # vect_x_36.8_70 = PHI <vect_x_9.10_72(12), _69(4)> # vect_vec_iv_.12_76 = PHI <_77(12), { 0, 0, 0, 0 }(4)> # vect_index_39.13_78 = PHI <vect_index_12.14_79(12), { 0, 0, 0, 0 }(4)> _4 = a[i_43]; vect_cst__68 = {_4, _4, _4, _4}; mask__16.9_71 = vect_cst__68 > vect_x_36.8_70; vect_index_12.14_79 = VEC_COND_EXPR <mask__16.9_71, vect_vec_iv_.12_76, vect_index_39.13_78>; vect_x_9.10_72 = VEC_COND_EXPR <mask__16.9_71, vect_cst__68, vect_x_36.8_70>; i_24 = i_43 + 1; ivtmp_33 = ivtmp_34 - 1; _77 = vect_vec_iv_.12_76 + { 1, 1, 1, 1 }; if (ivtmp_33 != 0) goto <bb 12>; [98.99%] else goto <bb 8>; [1.01%] The SLP tree seems to mostly be working on lanes of externals: note: Vectorizing SLP tree: note: node 0x42900120 (max_nunits=4, refcnt=1) vector(4) float note: op template: x_41 = PHI <x_9(5)> note: [l] stmt 0 x_41 = PHI <x_9(5)> note: children 0x429001c8 note: node 0x429001c8 (max_nunits=4, refcnt=2) vector(4) float note: op template: x_9 = _16 ? _4 : x_36; note: stmt 0 x_9 = _16 ? _4 : x_36; note: children 0x42900270 0x42900318 0x429003c0 note: node 0x42900270 (max_nunits=4, refcnt=2) vector(4) <signed-boolean:32> note: op template: _16 = _4 > x_36; note: stmt 0 _16 = _4 > x_36; note: children 0x42900318 0x429003c0 note: node 0x42900318 (max_nunits=4, refcnt=2) vector(4) float note: op template: _4 = a[i_43]; note: stmt 0 _4 = a[i_43]; note: node 0x429003c0 (max_nunits=4, refcnt=2) vector(4) float note: op template: x_36 = PHI <x_9(12), x_22(4)> note: stmt 0 x_36 = PHI <x_9(12), x_22(4)> note: children 0x429001c8 0x42900468 note: node (external) 0x42900468 (max_nunits=1, refcnt=1) vector(4) float note: { x_22 } it also looks like we missed simplifying a > b ? a : b into just a max. Before we failed during analysis in the block that was removed: missed: Unsupported loop-closed phi in outer-loop. missed: bad operation or unsupported loop bound and now it's a costing issue, as it's an inner loop, You can reduce it down to #define iterations 100000 #define LEN_1D 32000 float a[LEN_1D]; int main() { float x; for (int nl = 0; nl < iterations; nl++) { x = a[0]; for (int i = 0; i < LEN_1D; ++i) { if (a[i] > x) { x = a[i]; } } } return x > 1; } It looks like the access of a[0] in the outer loop is making it treat the inner loop as only being able to access one element at a time.