8 Regression] Compiling with -O3 -mavx2 gives wrong code

jakub at gcc dot gnu.org Fri, 08 Dec 2017 07:56:36 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
More complete testcase:

int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo ()
{
  int k, r = -1;
  for (k = 0; k < 8; k++)
    if (v[k] == 77)
      r = k;
  if (r != 0)
    __builtin_abort ();
}

__attribute__((noipa)) void
bar ()
{
  int k, r = 4;
  for (k = 0; k < 8; k++)
    if (v[k] == 79)
      r = k;
  if (r != 2)
    __builtin_abort ();
}

int
main ()
{
  foo ();
  bar ();
  return 0;
}

The conditional reduction handling is buggy.
In foo we emit:
  vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
  vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };

  <bb 3> [local count: 119292720]:
...
  # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 0, 1, 2, 3, 4, 5, 6, 7
}(2)>
  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
...
  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
vect_vec_iv_.0_22, vect_r_3.1_24>;
...
  <bb 18> [local count: 119292720]:
  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32;

vect_cst__30 which seems to be the initial value of the reduction var r as a
vector is unused.
The problem is that by starting with zero vector for vect_r_3.1_24 there is no
difference between a condition match on the first iteration and
no match at all, both result in REDUC_MAX of 0 and the emitted code assumes
REDUC_MAX of 0 means no match.

In this case (if the first iteration iterator is constant and bigger than the
minimum value of the type), just initializing by a vector containing any value
smaller than the first iteration IV and adjusting that:
  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32;
to
  stmp_r_3.7_33 = stmp_r_3.7_32 == the_chosen_value ? -1 : stmp_r_3.7_32;
or specially in case when the reduction var is previously initialized to a
value smaller than the minimum, we could build a vector of those values and
avoid the COND_EXPR on the REDUC_MAX value.

Now, in case the first iteration iterator is constant, but is the minimum
value, we can't use this trick.  Perhaps we could in that case just
bias it by one, say if the reduction is with unsigned type emit e.g.:
  # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7, 8
}(2)>
  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
...
  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
vect_vec_iv_.0_22, vect_r_3.1_24>;
...
  <bb 18> [local count: 119292720]:
  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
  stmt_r_3.7_34 = stmp_r_3.7_32 - 1;
  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <original_r_value> : stmt_r_3.7_34;

For the non-constant IV first value we actually emit really weird code:
int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo (int *v, int f)
{
  int k, r = -1;
  for (k = f; k < f + 8; k++)
    if (v[k] == 77)
      r = k;
  if (r != 0)
    __builtin_abort ();
}

__attribute__((noipa)) void
bar (int *v, int f)
{
  int k, r = 4;
  for (k = f; k < f + 8; k++)
    if (v[k] == 79)
      r = k;
  if (r != 2)
    __builtin_abort ();
}

int
main ()
{
  foo (v, 0);
  bar (v, 0);
  return 0;
}

where we emit 2 VEC_COND_EXPRs and 2 REDUC_MAX.  While that testcases passes,
not really sure if it is correct generally, and furthermore,
it seems unnecessarily complicated to me.  Can't we just emit what we'd emit
for unsigned conditional reduction with first iteration 1, and only after the
vectorized loop adjust it.
So, say for the foo in the second case, emit:

  vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
  vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };

  <bb 3> [local count: 119292720]:
...
  # vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7, 8
}(2)>
  # vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
...
  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
vect_vec_iv_.0_22, vect_r_3.1_24>;
...
  <bb 18> [local count: 119292720]:
  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
  stmt_r_3.7_34 = f_9(D) + (stmp_r_3.7_32 - 1) * step;
  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <r_value_before_loop> : stmp_r_3.7_34;
where _22, _24, _29 would be all in vectors of unsigned_type_for (r)?
Or for signed start with { min, min, ... } as condition never seen value, and {
min+1, min+2, min+3, ... } vector as the initial _22 value?

[Bug tree-optimization/80631] [6/7/8 Regression] Compiling with -O3 -mavx2 gives wrong code

Reply via email to