http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812
--- Comment #12 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-04-08 14:03:45 UTC --- Richard, We found out another issue related to your fix (r196872), namely for the attached test-case t1.c function vect_gen_niters_for_prolog_loop() uses non-invariant pointer (v1) for calculation of #iterations for prolog but before your fix it uses invariant pointer (x) for doing it and all these evaluations can be hoised out of outermost loop: before your fix <bb 6>: niters.3_17 = (unsigned int) len_7; vect_px.4_4 = x_24(D); _119 = (unsigned long) vect_px.4_4; _118 = _119 & 31; _117 = _118 >> 2; _116 = -_117; _115 = (unsigned int) _116; _114 = _115 & 7; prolog_loop_niters.5_52 = MIN_EXPR <niters.3_17, _114>; after your fix <bb 6>: niters.3_17 = (unsigned int) len_7; vect_pv1.4_4 = v1_16; _119 = (unsigned long) vect_pv1.4_4; It leads to 7% performance regression on 482.sphinx3 from spec2006 (since #itertaions of outer loop is much more greater (4096) then #iteration of inner loop (13)). This can be reproduced with following options: -O3 -funroll-loops -ffast-math -march=corei7 -mavx