http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56812



--- Comment #12 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-04-08 
14:03:45 UTC ---

Richard,



We found out another issue related to your fix (r196872), namely for the

attached test-case t1.c function vect_gen_niters_for_prolog_loop() uses

non-invariant pointer (v1) for calculation of #iterations for prolog but before

your fix it uses invariant pointer (x) for doing it and all these evaluations

can be hoised out of outermost loop:



before your fix

  <bb 6>:

  niters.3_17 = (unsigned int) len_7;

  vect_px.4_4 = x_24(D);

  _119 = (unsigned long) vect_px.4_4;

  _118 = _119 & 31;

  _117 = _118 >> 2;

  _116 = -_117;

  _115 = (unsigned int) _116;

  _114 = _115 & 7;

  prolog_loop_niters.5_52 = MIN_EXPR <niters.3_17, _114>;



after your fix



  <bb 6>:

  niters.3_17 = (unsigned int) len_7;

  vect_pv1.4_4 = v1_16;

  _119 = (unsigned long) vect_pv1.4_4;



It leads to 7% performance regression on 482.sphinx3 from spec2006 (since

#itertaions of outer loop is much more greater (4096) then #iteration of inner

loop (13)).



This can be reproduced with following options:



  -O3 -funroll-loops -ffast-math -march=corei7 -mavx

Reply via email to