Hi! This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec} and testing flags, instead of lock xadd plus comparison. The sync_old_add<mode> predicate change makes it possible to optimize __sync_add_and_fetch with constant second argument to same code as __sync_fetch_and_add. Doing it in peephole2 has a disadvantage though, both that the 3 instructions need to be consecutive and e.g. that xadd insn has to be supported by the CPU. Other alternative would be to come up with a new bool builtin that would represent the whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space in its name to make it inaccessible), try to match it during some folding and expand it using special optab.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk this way? 2011-05-16 Jakub Jelinek <ja...@redhat.com> PR target/48986 * config/i386/sync.md (sync_old_add<mode>): Relax operand 2 predicate to allow CONST_INT. (*sync_old_add_cmp<mode>): New insn and peephole2 for it. --- gcc/config/i386/sync.md.jj 2010-05-21 11:46:29.000000000 +0200 +++ gcc/config/i386/sync.md 2011-05-16 14:42:08.000000000 +0200 @@ -170,11 +170,62 @@ (define_insn "sync_old_add<mode>" [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_XCHG)) (set (match_dup 1) (plus:SWI (match_dup 1) - (match_operand:SWI 2 "register_operand" "0"))) + (match_operand:SWI 2 "nonmemory_operand" "0"))) (clobber (reg:CC FLAGS_REG))] "TARGET_XADD" "lock{%;} xadd{<imodesuffix>}\t{%0, %1|%1, %0}") +(define_peephole2 + [(set (match_operand:SWI 0 "register_operand" "") + (match_operand:SWI 2 "const_int_operand" "")) + (parallel [(set (match_dup 0) + (unspec_volatile:SWI + [(match_operand:SWI 1 "memory_operand" "")] UNSPECV_XCHG)) + (set (match_dup 1) + (plus:SWI (match_dup 1) + (match_dup 0))) + (clobber (reg:CC FLAGS_REG))]) + (set (reg:CCZ FLAGS_REG) + (compare:CCZ (match_dup 0) + (match_operand:SWI 3 "const_int_operand" "")))] + "peep2_reg_dead_p (3, operands[0]) + && (unsigned HOST_WIDE_INT) INTVAL (operands[2]) + == -(unsigned HOST_WIDE_INT) INTVAL (operands[3]) + && !reg_overlap_mentioned_p (operands[0], operands[1])" + [(parallel [(set (reg:CCZ FLAGS_REG) + (compare:CCZ (unspec_volatile:SWI [(match_dup 1)] + UNSPECV_XCHG) + (match_dup 3))) + (set (match_dup 1) + (plus:SWI (match_dup 1) + (match_dup 2)))])]) + +(define_insn "*sync_old_add_cmp<mode>" + [(set (reg:CCZ FLAGS_REG) + (compare:CCZ (unspec_volatile:SWI + [(match_operand:SWI 0 "memory_operand" "+m")] + UNSPECV_XCHG) + (match_operand:SWI 2 "const_int_operand" "i"))) + (set (match_dup 0) + (plus:SWI (match_dup 0) + (match_operand:SWI 1 "const_int_operand" "i")))] + "(unsigned HOST_WIDE_INT) INTVAL (operands[1]) + == -(unsigned HOST_WIDE_INT) INTVAL (operands[2])" +{ + if (TARGET_USE_INCDEC) + { + if (operands[1] == const1_rtx) + return "lock{%;} inc{<imodesuffix>}\t%0"; + if (operands[1] == constm1_rtx) + return "lock{%;} dec{<imodesuffix>}\t%0"; + } + + if (x86_maybe_negate_const_int (&operands[1], <MODE>mode)) + return "lock{%;} sub{<imodesuffix>}\t{%1, %0|%0, %1}"; + + return "lock{%;} add{<imodesuffix>}\t{%1, %0|%0, %1}"; +}) + ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space. (define_insn "sync_lock_test_and_set<mode>" [(set (match_operand:SWI 0 "register_operand" "=<r>") Jakub