Issue 60200
Summary [AArch64] multiply-by-parts idiom could be recognized
Labels
Assignees
Reporter rose00
    reproducer: https://gcc.godbolt.org/z/xsTP44xvn

The two functions compute the same result, a 64x64-to-128 bit unsigned multiply.  The portable version (that avoids `__int128`) does fold to good code like the non-portable one on either x86 or aarch64.

The portable version is modeled on:

https://github.com/Cyan4973/xxHash/blob/8e5fdcbe70687573265b7154515567ee7ca0645c/xxh3.h#L338

Since mul and umulh are cheap, there is a special advantage to
recognizing and simplify portable code that performs these
operations by parts using 32-to-64-bit multiplies.

For comparison, similar simplifications target rbit and ror.
https://gcc.godbolt.org/z/Ghb3rfP5W

Also popcnt is combined into the intrinsic from the portable code.

The existing implementations are in places like
InstCombine/InstCombineAndOrXor.cpp
and
AggressiveInstCombine/AggressiveInstCombine.cpp.

Handling the present use case would require recognition that a 128-bit multiply is happening, but based only on observations of 64-bit values.  Both 64-bit values, built up from 32-bit temporary values, would get replaced by a full 128-bit combo value, which would then be split up again (or not).

It seems to me that the "bit provenance" algorithm which recognizes bit-reverses might possible be adaptable to a "scaled term" provenance algorithm that would enable a pattern matcher to verify that a nest of shift/multiply/mask nodes actually did add up to 128 bits of product.

This is my first bug report.  While I do realize that "missed optimizations" are not really bugs, I did not find in the user guide a way to make a record of suggestions like this.  I hope this is actionable, perhaps after moving into whatever bucket such things should actually go into.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to