On 11/10/2021 1:35 AM, Richard Biener via Gcc-patches wrote:
On Tue, Nov 9, 2021 at 5:25 PM Navid Rahimi <navidrah...@microsoft.com> wrote:
Hi Richard,
Thank you so much for your detailed feedback. I am attaching another version of
the patch which does include all the changes you mentioned.
Bellow you can see my response to your feedbacks:
the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as
constants come last - you can then remove the 'c'
Fixed. I was not aware of the canonical order.
you can use INTEGRAL_TYPE_P (type).
Fixed. Didn't know about "type" either.
this test is not necessary
Fixed.
But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So
it looks like a conflict with the x * (1 + b) <-> x + x * b transform
(fold_plusminus_mult)? That said, the special case of one
definitely makes the result cheaper (one less operation).
For this special case, it does remove operation indeed. But I was not able to reproduce it for any other constant [1]. If it was possible
to do it with other constants I would've changed the pattern to have be more general like "x * (C + y/x) - y -> C*x - y % x".
Basically anything other than 1 wasn't a win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to me
when there is a "- y" at the end "x * (1 + b)", there is opportunity to optimize. Without that "- y" I was
not able to make any operation more performant. Either direction, looks like same amount of computation.
1) https://compiler-explorer.com/z/dWsq7Tzf4
Please move the pattern next to the most relatest which I think is
Fixed.
the return value of 1 is an unreliable way to fail, please instead call
__builtin_abort ();
Fixed.
do we really need -O3 for this? I think that -O should be enough here?
We don't specifically need that. But I realized that the optimization can
happen in two different level at the compiler. It seems if you spread the
computation over multiple statement like:
int c = a/b;
int y = b * (1 + c);
return y - a;
instead of :
return b * (1 + a / b) - a;
Then you have to have at least -O1 to have it optimized. Granted, I am not
doing that in the testcase. In the new patch I am changing it to -O. Let me
know if you have any suggestions.
-O is fine, generally at -O0 we shouldn't expect such transforms to
happen (but they still do, of course).
The patch looks OK now.
I've pushed this to the trunk for Navid.
jeff