Issue |
60200
|
Summary |
[AArch64] multiply-by-parts idiom could be recognized
|
Labels |
|
Assignees |
|
Reporter |
rose00
|
reproducer: https://gcc.godbolt.org/z/xsTP44xvn
The two functions compute the same result, a 64x64-to-128 bit unsigned multiply. The portable version (that avoids `__int128`) does fold to good code like the non-portable one on either x86 or aarch64.
The portable version is modeled on:
https://github.com/Cyan4973/xxHash/blob/8e5fdcbe70687573265b7154515567ee7ca0645c/xxh3.h#L338
Since mul and umulh are cheap, there is a special advantage to
recognizing and simplify portable code that performs these
operations by parts using 32-to-64-bit multiplies.
For comparison, similar simplifications target rbit and ror.
https://gcc.godbolt.org/z/Ghb3rfP5W
Also popcnt is combined into the intrinsic from the portable code.
The existing implementations are in places like
InstCombine/InstCombineAndOrXor.cpp
and
AggressiveInstCombine/AggressiveInstCombine.cpp.
Handling the present use case would require recognition that a 128-bit multiply is happening, but based only on observations of 64-bit values. Both 64-bit values, built up from 32-bit temporary values, would get replaced by a full 128-bit combo value, which would then be split up again (or not).
It seems to me that the "bit provenance" algorithm which recognizes bit-reverses might possible be adaptable to a "scaled term" provenance algorithm that would enable a pattern matcher to verify that a nest of shift/multiply/mask nodes actually did add up to 128 bits of product.
This is my first bug report. While I do realize that "missed optimizations" are not really bugs, I did not find in the user guide a way to make a record of suggestions like this. I hope this is actionable, perhaps after moving into whatever bucket such things should actually go into.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs