https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96305
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Unnecessary signed x |not detecting widen
|unsigned multiplication |multiple after a widen
|with squares of signed |multiply with shift
|variables |
Target|arm-*-* |arm-*-*, aarch64-*-*
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2021-09-27
Component|target |tree-optimization
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is a gimple level issue really.
We are able to figure out one widen multiple with shift but not the second one:
_10 = a_2(D) w* a_2(D);
_11 = _10 >> 32;
_3 = (long long int) b_4(D);
_6 = _3 * _11;
_7 = _6 >> 32;
_8 = (int) _7;
You can also see the issue on aarch64 too.
If we do this:
inline int hmull(int a, int b) {
return ((long long)a * b) >> 32;
}
int compute(int a, int b) {
int t = hmull(a,a);
asm("":"+r"(t));
return hmull(t, b);
}
------- CUT ----
On aarch64 we get:
smull x0, w0, w0
asr x2, x0, 32
smull x0, w1, w2
lsr x0, x0, 32
ret
which is exactly what we want.
And on arm we get:
smull r3, r0, r0, r0
smull r1, r0, r1, r0
bx lr
Gimple level:
_11 = a_2(D) w* a_2(D);
_12 = _11 >> 32;
_13 = (int) _12;
__asm__("" : "=r" t_4 : "0" _13);
_7 = b_5(D) w* t_4;
_8 = _7 >> 32;
_9 = (int) _8;
Notice w* there :).
Note the inline-asm helps even clang too.