Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

Victor Tong via Gcc-patches Wed, 02 Jun 2021 13:55:25 -0700

Hi Richard,

Thanks for reviewing my patch. I did a search online and you're right -- there 
isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y - (Y % X) 
pattern and the existing X - (X / Y) * Y --> X % Y from triggering on vector 
types.

I looked into why the following pattern isn't triggering:

  (simplify
   (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
   (view_convert @1))

The nop_converts expand into tree_nop_conversion_p checks. In fn2() of the 
testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching looks 
like: 

42 - (long int) (42 - 42 % x)

When looking at the right-hand side of the expression (the (long int) (42 - 42 
% x)), the tree_nop_conversion_p check fails because of the type precision 
difference. The expression inside of the cast has a 32-bit precision and the 
outer expression has a 64-bit precision.

I looked around at other patterns and it seems like nop_convert and 
view_convert are used because of underflow/overflow concerns. I'm not familiar 
with the two constructs. What's the difference between using them and checking 
TYPE_OVERFLOW_UNDEFINED? In the scenario above, since TYPE_OVERFLOW_UNDEFINED 
is true, the second pattern that I added (X - (X - Y) --> Y) gets triggered.

Thanks,
Victor

From: Richard Biener <richard.guent...@gmail.com>
Sent: Tuesday, April 27, 2021 1:29 AM
To: Victor Tong <vit...@microsoft.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed 
by multiply [PR95176] 

On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hello,
>
> This patch fixes PR tree-optimization/95176. A new pattern in match.pd was 
> added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also 
> added to cover this scenario.
>
> The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In 
> some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new 
> pattern is applied causing the existing pattern to no longer apply. This 
> results in worse code generation because the expression is left as "X - (X - 
> Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to 
> this patch to avoid this regression.
>
> I also didn't remove the existing pattern because it triggered in more cases 
> than the new pattern because of a tree_invariant_p check that's inserted by 
> genmatch for the new pattern.

Yes, we do not handle using Y multiple times when it might contain
side-effects in GENERIC folding
(comments in genmatch suggest we can use save_expr but we don't
implement this [anymore]).

On GIMPLE there's also the issue that your new pattern creates a
complex expression which
makes it failed to be used by value-numbering for example where the
old pattern was OK
(eventually, if no conversion was required).

So indeed it looks OK to preserve both.

I wonder why you needed the

+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
TYPE_OVERFLOW_UNDEFINED(type))
+  (convert @1)))

pattern since it should be handled by

  /* Match patterns that allow contracting a plus-minus pair
     irrespective of overflow issues.  */
  /* (A +- B) - A       ->  +- B */
  /* (A +- B) -+ B      ->  A */
  /* A - (A +- B)       -> -+ B */
  /* A +- (B -+ A)      ->  +- B */

in particular

  (simplify
   (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
   (view_convert @1))

if there's supported cases missing I'd rather extend this pattern than
replicating it.

+/* X * (Y / X) is the same as Y - (Y % X).  */
+(simplify
+ (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
+ (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  (minus (convert @1) (convert (trunc_mod @1 @0)))))

note that if you're allowing vector types you have to use
(view_convert ...) in the
transform and you also need to make sure that the target can expand
the modulo - I suspect that's an issue with the existing pattern as well.
I don't know of any vector ISA that supports modulo (or integer
division, that is).
Restricting the patterns to integer types is probably the most
sensible solution.

Thanks,
Richard.

> I verified that all "make -k check" tests pass when targeting 
> x86_64-pc-linux-gnu.
>
> 2021-03-31  Victor Tong  <vit...@microsoft.com>
>
> gcc/ChangeLog:
>
>         * match.pd: Two new patterns: One to optimize division followed by 
>multiply and the other to avoid a regression as explained above
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a 
>subtraction because a shift is no longer emitted
>         * gcc.dg/pr95176.c: New test to cover optimizing division followed by 
>multiply
>
> I don't have write access to the GCC repo but I've completed the FSF 
> paperwork as I plan to make more contributions in the future. I'm looking for 
> a sponsorship from an existing GCC maintainer before applying for write 
> access.
>
> Thanks,
> Victor

Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

Reply via email to