https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522
Kewen Lin <linkw at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |INVALID
CC| |rguenth at gcc dot gnu.org
Status|WAITING |RESOLVED
--- Comment #12 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Mathieu Malaterre from comment #11)
> Here is a dead simple reduced version:
>
> ```
> % cat pr111522.cc
> #include <iostream>
> #include <cstring>
> #pragma GCC push_options
> #pragma GCC target "cpu=power10"
> float BitCast(int in) {
> float out;
> memcpy(&out, &in, sizeof(out));
> return out;
> }
> float kNearOneF = BitCast(1065353215);
> #pragma GCC pop_options
> int main() { std::cout << kNearOneF << std::endl; }
> ```
>
> You can compare:
>
> g++ -o works -O2 pr111522.cc -Wall -Wextra -Werror -Wfatal-errors
>
> vs
>
> g++ -o fails -flto -O2 pr111522.cc -Wall -Wextra -Werror -Wfatal-errors
>
> For some reason, `-flto` rightfully generates a `xxspltidp` instruction:
>
> (gdb) display/i $pc
> 1: x/i $pc
> => 0x100000940 <_Z7BitCasti.constprop.0>: xxspltidp vs1,1065353215
>
> I am not sure I understand the behavior of the non LTO case now...
I think this is a test issue. The given source code claims it wants to compile
the function BitCast with -mcpu=power10, it's valid to generate power10 insns
for it and its specialized ones.
Without LTO, no power10 insn helps the general BitCast, so the generated insns
looks like:
0000000010000b10 <_Z7BitCasti>:
10000b10: c6 07 69 78 rldicr r9,r3,32,31
10000b14: 66 01 29 7c mtfprd f1,r9
10000b18: 2c 0d 20 f0 xscvspdpn vs1,vs1
10000b1c: 20 00 80 4e blr
while with LTO, function versioning is able to create one specialized function
with fixed argument 1065353215, then the newly created one is able to leverage
power10 insn so we have:
// specialized with const argument propagate
0000000010000840 <_Z7BitCasti.constprop.0>:
10000840: 7f 3f 00 05 xxspltidp vs1,1065353215
10000844: ff ff 24 80
10000848: 20 00 80 4e blr
while the global variable initialization still uses power8 insns:
0000000010000940 <_GLOBAL__sub_I__Z7BitCasti>:
10000940: 02 10 40 3c lis r2,4098
10000944: 00 7f 42 38 addi r2,r2,32512
10000948: a6 02 08 7c mflr r0
1000094c: 10 00 01 f8 std r0,16(r1)
10000950: e1 ff 21 f8 stdu r1,-32(r1)
10000954: dd fe ff 4b bl 10000830 <00000184.long_branch.184:6>
10000958: 18 00 41 e8 ld r2,24(r1)
1000095c: 20 00 21 38 addi r1,r1,32
10000960: 00 00 00 60 nop
10000964: 10 00 01 e8 ld r0,16(r1)
10000968: 5c 81 22 d0 stfs f1,-32420(r2)
1000096c: a6 03 08 7c mtlr r0
10000970: 20 00 80 4e blr
If we specify -mcpu=power10 -flto, we can see _GLOBAL__sub_I__Z7BitCasti will
directly adopts p10 insns (it implicitly indicates that with the default
-mcpu=power8, inlining considers it's unsafe to inline _Z7BitCasti.constprop.0)
0000000010000900 <_GLOBAL__sub_I__Z7BitCasti>:
10000900: 7f 3f 00 05 xxspltidp vs0,1065353215
10000904: ff ff 04 80
10000908: 01 00 10 06 pstfs f0,128852 # 1002005c <kNearOneF>
1000090c: 54 f7 00 d0
10000910: 20 00 80 4e blr