On Tue, Sep 27, 2022 at 10:46 AM Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > I did bootstrapping and ran the testsuite on x86(-64), aarch64, Power9 > > and s390. Everything looks good except two additional fails on x86 > > where code actually looks worse. > > > > gcc.target/i386/keylocker-encodekey128.c > > > > 17c17,18 > > < movaps %xmm4, k2(%rip) > > --- > >> pxor %xmm0, %xmm0 > >> movaps %xmm0, k2(%rip) > > > > gcc.target/i386/keylocker-encodekey256.c: > > > > 19c19,20 > > < movaps %xmm4, k3(%rip) > > --- > >> pxor %xmm0, %xmm0 > >> movaps %xmm0, k3(%rip) > > Before the patch and after postreload we have: > > (insn (set (reg:V2DI xmm0) > (reg:V2DI xmm4)) > (expr_list:REG_DEAD (reg:V2DI 24 xmm4) > (expr_list:REG_EQUIV (const_vector:V2DI [ > (const_int 0 [0]) repeated x2 > ]))))) > (insn (set (mem/c:V2DI (symbol_ref:DI ("k2")) > (reg:V2DI xmm0)))) > > which is converted by cprop_hardreg to: > > (insn (set (mem/c:V2DI (symbol_ref:DI ("k2"))) > (reg:V2DI xmm4)))) > > With the change there is: > > (insn (set (reg:V2DI xmm0) > (const_vector:V2DI [ > (const_int 0 [0]) repeated x2 > ]))) > (insn (set (mem/c:V2DI (symbol_ref:DI ("k2"))) > (reg:V2DI xmm0)))) > > which is not simplified further because xmm0 needs to be explicitly > zeroed while xmm4 is assumed to be zeroed by encodekey128. I'm not > familiar with this so I'm supposing this is correct even though I found > "XMM4 through XMM6 are reserved for future usages and software should > not rely upon them being zeroed." online.
I opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107061 > Even inf xmm4 were zeroed explicity, I guess in this case the simple > costing of mov reg,reg vs mov reg,imm (with the latter not being more > expensive) falls short? cprop_hardreg can actually propagate the zeroed > xmm4 into the next move. > The same mechanism could possibly even elide many such moves which would > mean we'd unnecessarily emit many mov reg,0? Hmm... This sounds like an issue. -- H.J.