https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81082

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So is there anything we can do here?
Isn't the bigger problem that we no longer notice the multiplications can't
overflow?  With the int multiplications gone, we need to assume [1, INT_MAX]
ranges on b1, b2 and b3 in the loop (otherwise it wouldn't loop at all), and
[0, INT_MAX - 1] ranges for i1, i2 and i3.  But, from the i1 * b2 * b3 + i2 *
b3 + (i3 - 1) we could have determined that b1 * b2 * b3 is [0, INT_MAX].

We do vectorize:
int
f (int *x, int b1, int b2, int b3)
{
  int foo = 0;
  for (int i1 = 0; i1 < b1; ++i1)
    for (int i2 = 0; i2 < b2; ++i2)
      for (int i3 = 0; i3 < b3; ++i3)
        foo += x[(i1 * b2 + i2) * b3 + (i3 - 1)];
  return foo;
}
without gather loads, while the #c0 only with them (if available).

Regarding the PR66313 optimization, first of all, what happened to the move of
it to match.pd?
As we vectorize:
int
f (int *x, int b1, int b2, int b3)
{
  int foo = 0;
  for (int i1 = 0; i1 < b1; ++i1)
    for (int i2 = 0; i2 < b2; ++i2)
      for (int i3 = 0; i3 < b3; ++i3)
        {
          int t1 = i1 * b2 * b3;
          int t2 = i2 * b3;
          int t3 = i3 - 1;
          foo += x[t1 + t2 + t3];
        }
  return foo;
}

it seems that hasn't been done.

Second thing, in fold_plusminus_mult_expr I don't understand the:
      /* We are neither factoring zero nor minus one.  */
      || TREE_CODE (same) == INTEGER_CST)
part, shouldn't that be || (TREE_CODE (same) == INTEGER_CST && !integer_zerop
(same) && !integer_minus_onep (same)) ?

Another thing is, for signed a, b, c we can't optimize a * b + a * c as
a * (b + c) if a can be -1 or 0, exhaustive proof for signed char:
int
main ()
{
  int a, b, c;
  for (a = -128; a < 128; a++)
    for (b = -128; b < 128; b++)
      for (c = -128; c < 128; c++)
        {
          int ab = a * b;
          int ac = a * c;
          int ab_ac = ab + ac;
          if (ab < -128 || ab >= 128 || ac < -128 || ac >= 128 || ab_ac < -128
|| ab_ac >= 128)
            continue;
          /* Ok, in this case a * b + a * c is without UB.  */
          int b_c = b + c;
#ifdef NEW
          b_c = (signed char) b_c;
#endif
          int r = a * b_c;
          if (b_c < -128 || b_c >= 128 || r != ab_ac)
            __builtin_printf ("%d * %d + %d * %d = %d wrong opt (%d)\n", a, b,
a, c, ab_ac, r);
        }
  return 0;
}

But, if we know for sure that a is not -1, we could optimize it as
  a * (signed)((unsigned) b + c)
(which doesn't work for a == -1, but works for others) instead of
  (signed)(a * ((unsigned) b + c))
(which works for all, but is a too big hammer).

Perhaps if this optimization is moved over to match.pd and we'd defer it if a
isn't constant until at least vrp1 and tune based on range info?
The range for b3 at least during vrp pass itself is [1, INT_MAX], so neither -1
nor 0 are possible and we could safely reassociate it as a * (b + c) rather
than the variants with unsigned.

Or do we want to somehow mark the MULT_EXPR we've created, propagate it through
the IL and if we in vrp notice this way marked MULT_EXPR (or IFN) we fold it as
integer multiplication?  That would be ugly.

Reply via email to