https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88540

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-*, i?86-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2018-12-19
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
          Component|c                           |tree-optimization
             Blocks|                            |53947
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is because without -ffast-math the completely unrolled loop isn't
if-converted to MIN and thus basic-block vectorization fails.  With
loop vectorization we apply if-conversion:

  _1 = (long unsigned int) n_20;
  _2 = _1 * 8;
  _3 = d1_12(D) + _2;
  _4 = *_3;
  _5 = d2_13(D) + _2;
  _6 = *_5;
  iftmp.0_9 = _4 < _6 ? _4 : _6;
  _7 = d3_14(D) + _2;
  *_7 = iftmp.0_9;
  n_16 = n_20 + 1;

and vectorize it as

  vect_iftmp.7_43 = VEC_COND_EXPR <vect__4.3_39 < vect__6.6_42, vect__4.3_39,
vect__6.6_42>;

ending up as

(insn 12 11 13 (set (reg:V2DF 98 [ vect_iftmp.7 ])
        (unspec:V2DF [
                (reg:V2DF 87 [ vect__4.3 ])
                (reg:V2DF 88 [ vect__6.6 ])
            ] UNSPEC_IEEE_MIN)) "t.c":7 -1
     (nil))

and exactly the same assembly as with -ffast-math.

So the issue is that we do not if-convert the MIN pattern to use
a COND_EXPR in phiopt [when the target has an IEEE MIN we can use].
Or, that basic-block vectorization does not perform if-conversion
on non-loop code.

You can workaround in your code with

#pragma GCC unroll 0
  for (int n = 0; n < SIZE; ++n)
    {
      d3[n] = d1[n] < d2[n] ? d1[n] : d2[n];
    }

keeping the loop and using loop vectorization.

Note the backend could implement the fmin/fmax optabs which allows
more optimizations.  Also minmax_replacement in phi-opt could make
use of the FMIN/FMAX IFNs when HONOR_NANS || HONOR_SIGNED_ZEROS
and the direct IFN is available.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to