On Mon, Apr 25, 2022 at 1:16 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch addresses a (minor) missed-optimization regression revealed
> by Richard Biener's example/variant in comment #1 of PR target/92578.
>
> int foo(int moves, int movecnt, int komove) {
>     int newcnt = movecnt;
>     if (moves == komove)
>         newcnt -= 2;
>     return newcnt;
> }
>
> Comparing code generation on godbolt.org shows an interesting evolution
> over time, as changes in register allocation affect the cmove sequence.
>
> GCC 4.1.2 (4 instructions, suboptimal mov after cmov).
>         leal    -2(%rsi), %eax
>         cmpl    %edx, %edi
>         cmove   %eax, %esi
>         movl    %esi, %eax
>
> GCC 4.4-4.7 (3 instructions, optimal)
>         leal    -2(%rsi), %eax
>         cmpl    %edx, %edi
>         cmovne  %esi, %eax
>
> GCC 5-7 (4 instructions, suboptimal mov before cmov)
>         leal    -2(%rsi), %ecx
>         movl    %esi, %eax
>         cmpl    %edx, %edi
>         cmove   %ecx, %eax
>
> GCC 8 (4 instructions, suboptimal mov before cmov, reordered)
>         movl    %esi, %eax
>         leal    -2(%rsi), %ecx
>         cmpl    %edx, %edi
>         cmove   %ecx, %eax
>
> GCC 9-trunk (5 instructions, two suboptimal movs before cmov)
>         movl    %edx, %ecx
>         movl    %esi, %eax
>         leal    -2(%rsi), %edx
>         cmpl    %ecx, %edi
>         cmove   %edx, %eax
>
> The challenge is that x86's two operand conditional moves, that require
> the destination to be one of the (register) sources, are tricky for reload,
> whose heuristics unify pseudos early (greedily?).  In this case, we have
> the equivalent of "pseudo1 = cond ? pseudo2 : expression", and we'd like
> to see "pseudo1 = expression; pseudo1 = cond ? pseudo1 : pseudo2", but
> alas reload (currently and quite reasonably) prefers to place pseudo1 and
> pseudo2 in the same hard register if possible.  Hence the solution is to
> fixup/tweak the register allocation during peephole2, as previously with
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575998.html
>
> Instead of a single peephole2 to catch just the current idiom (last above),
> I've added the four peephole2s that would catch each of the (historical)
> suboptimal variants above and transform them into the ideal 3 insn form.
> Instrumenting the compiler shows, for example, that the (earliest) movl
> after cmov triggers over 50 times during stage2 of a GCC bootstrap.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}, with
> no new failures.  Ok for mainline?  Or if this regression isn't serious
> enough for stage4 (or these patterns considered too risky), for stage1
> when it reopens?  I suspect the poor interaction between cmove usage
> and register allocation is one source of confusion when comparing code
> generation with vs. without cmove (the other major source of confusion
> being that well-predicted branches are free, but that prediction-quality
> is poorly predictable).
>
>
> 2022-04-25  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/92578
>         * config/i386/i386.md (peephole2): Eliminate register-to-register
>         moves by inverting the condition of a conditional move.
>
> gcc/testsuite/ChangeLog
>         PR target/92578
>         * gcc.target/i386/pr92758.c: New test case.

+;; Eliminate a reg-reg mov by inverting the condition of a cmov (#3).
+;; cmov r0,r1; mov r1,r0 -> cmov r1,r0
+(define_peephole2
+ [(set (match_operand:SWI248 0 "general_reg_operand")
+       (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
+                 [(reg FLAGS_REG) (const_int 0)])
+    (match_operand:SWI248 2 "general_reg_operand")
+    (match_operand:SWI248 3 "general_reg_operand")))
+  (set (match_operand:SWI248 4 "general_reg_operand")
+       (match_dup 0))]
+  "TARGET_CMOVE
+   && ((REGNO (operands[0]) == REGNO (operands[2])
+    && REGNO (operands[3]) == REGNO (operands[4]))
+       || (REGNO (operands[0]) == REGNO (operands[3])
+       && REGNO (operands[2]) == REGNO (operands[4])))
+   && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 4) (if_then_else:SWI248 (match_dup 1)
+                       (match_dup 2)
+                       (match_dup 3)))])

We have a valid cmov insn here, so no need to match operand 0 with 2
or 3. But it doesn't hurt to have some extra level of safety. OTOH,
splitting this pattern to two and using (match_dup X) instead would be
IMO more comprehensible.

+;; Eliminate a reg-reg mov by inverting the condition of a cmov (#5).
+;; mov x,r0; mov r1,r2; cmp; cmov r0,r2 -> mov x,r2; cmp; cmov r1,r2
+(define_peephole2
+ [(set (match_operand:SWI248 0 "general_reg_operand")
+       (match_operand:SWI248 1))

You probably want the "general_gr_operand" predicate here. Otherwise,
LEA also fits here, which is probably not what you intended.

I think that we can apply the first pattern (which probably represents
90% of matches), which looks simple and safe. Others look a bit too
complicated (and hard to review) for this stage.

Uros.





>
> Thanks in advance,
> Roger
> --
>

Reply via email to