On Mon, Apr 25, 2022 at 1:16 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch addresses a (minor) missed-optimization regression revealed > by Richard Biener's example/variant in comment #1 of PR target/92578. > > int foo(int moves, int movecnt, int komove) { > int newcnt = movecnt; > if (moves == komove) > newcnt -= 2; > return newcnt; > } > > Comparing code generation on godbolt.org shows an interesting evolution > over time, as changes in register allocation affect the cmove sequence. > > GCC 4.1.2 (4 instructions, suboptimal mov after cmov). > leal -2(%rsi), %eax > cmpl %edx, %edi > cmove %eax, %esi > movl %esi, %eax > > GCC 4.4-4.7 (3 instructions, optimal) > leal -2(%rsi), %eax > cmpl %edx, %edi > cmovne %esi, %eax > > GCC 5-7 (4 instructions, suboptimal mov before cmov) > leal -2(%rsi), %ecx > movl %esi, %eax > cmpl %edx, %edi > cmove %ecx, %eax > > GCC 8 (4 instructions, suboptimal mov before cmov, reordered) > movl %esi, %eax > leal -2(%rsi), %ecx > cmpl %edx, %edi > cmove %ecx, %eax > > GCC 9-trunk (5 instructions, two suboptimal movs before cmov) > movl %edx, %ecx > movl %esi, %eax > leal -2(%rsi), %edx > cmpl %ecx, %edi > cmove %edx, %eax > > The challenge is that x86's two operand conditional moves, that require > the destination to be one of the (register) sources, are tricky for reload, > whose heuristics unify pseudos early (greedily?). In this case, we have > the equivalent of "pseudo1 = cond ? pseudo2 : expression", and we'd like > to see "pseudo1 = expression; pseudo1 = cond ? pseudo1 : pseudo2", but > alas reload (currently and quite reasonably) prefers to place pseudo1 and > pseudo2 in the same hard register if possible. Hence the solution is to > fixup/tweak the register allocation during peephole2, as previously with > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575998.html > > Instead of a single peephole2 to catch just the current idiom (last above), > I've added the four peephole2s that would catch each of the (historical) > suboptimal variants above and transform them into the ideal 3 insn form. > Instrumenting the compiler shows, for example, that the (earliest) movl > after cmov triggers over 50 times during stage2 of a GCC bootstrap. > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, with > no new failures. Ok for mainline? Or if this regression isn't serious > enough for stage4 (or these patterns considered too risky), for stage1 > when it reopens? I suspect the poor interaction between cmove usage > and register allocation is one source of confusion when comparing code > generation with vs. without cmove (the other major source of confusion > being that well-predicted branches are free, but that prediction-quality > is poorly predictable). > > > 2022-04-25 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/92578 > * config/i386/i386.md (peephole2): Eliminate register-to-register > moves by inverting the condition of a conditional move. > > gcc/testsuite/ChangeLog > PR target/92578 > * gcc.target/i386/pr92758.c: New test case.
+;; Eliminate a reg-reg mov by inverting the condition of a cmov (#3). +;; cmov r0,r1; mov r1,r0 -> cmov r1,r0 +(define_peephole2 + [(set (match_operand:SWI248 0 "general_reg_operand") + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:SWI248 2 "general_reg_operand") + (match_operand:SWI248 3 "general_reg_operand"))) + (set (match_operand:SWI248 4 "general_reg_operand") + (match_dup 0))] + "TARGET_CMOVE + && ((REGNO (operands[0]) == REGNO (operands[2]) + && REGNO (operands[3]) == REGNO (operands[4])) + || (REGNO (operands[0]) == REGNO (operands[3]) + && REGNO (operands[2]) == REGNO (operands[4]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 4) (if_then_else:SWI248 (match_dup 1) + (match_dup 2) + (match_dup 3)))]) We have a valid cmov insn here, so no need to match operand 0 with 2 or 3. But it doesn't hurt to have some extra level of safety. OTOH, splitting this pattern to two and using (match_dup X) instead would be IMO more comprehensible. +;; Eliminate a reg-reg mov by inverting the condition of a cmov (#5). +;; mov x,r0; mov r1,r2; cmp; cmov r0,r2 -> mov x,r2; cmp; cmov r1,r2 +(define_peephole2 + [(set (match_operand:SWI248 0 "general_reg_operand") + (match_operand:SWI248 1)) You probably want the "general_gr_operand" predicate here. Otherwise, LEA also fits here, which is probably not what you intended. I think that we can apply the first pattern (which probably represents 90% of matches), which looks simple and safe. Others look a bit too complicated (and hard to review) for this stage. Uros. > > Thanks in advance, > Roger > -- >