8 Regression] x86-64 optimizer makes wrong decision when optimizing for size

jgreenhalgh at gcc dot gnu.org Mon, 17 Jul 2017 09:57:33 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81456


--- Comment #2 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #1)
> Confirmed, started with r238594.

The cost model relies on the target giving a reasonable approximation for an
instruction size through ix86_rtx_costs.

The basic branch structure looks like:


t = mod
if (a / b % 2)
  t = b - mod


In RTL, this looks like:

  (insn 14 13 15 2 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 99)
            (const_int 0 [0]))) "foo.c":5 3 {*cmpsi_ccno_1}
     (expr_list:REG_DEAD (reg:SI 99)
        (nil)))
  (jump_insn 15 14 16 2 (set (pc)
        (if_then_else (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (label_ref:DI 22)
            (pc))) "foo.c":5 617 {*jcc_1}
     (expr_list:REG_DEAD (reg:CCZ 17 flags)
        (int_list:REG_BR_PROB 20000 (nil)))
   -> 22)

  (note 16 15 17 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
  (insn 17 16 22 3 (parallel [
            (set (reg/v:SI 93 [ <retval> ])
                (minus:SI (reg/v:SI 95 [ b ])
                    (reg/v:SI 93 [ <retval> ])))
            (clobber (reg:CC 17 flags))
        ]) "foo.c":5 273 {*subsi_1}
     (expr_list:REG_DEAD (reg/v:SI 95 [ b ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
  (code_label 22 17 25 4 1 (nil) [1 uses])

That is to say, we're starting with a comparison, a branch and a subtract. We
want to know if that sequence is cheaper than a subtract a and conditional
select.

In the cost model, we take an approximation for the branch and comparison of
COST_N_INSNS(2) and the backend tells us the cost of a subtract is
COST_N_INSNS(1). Thus, the cost before transformation is COST_N_INSNS (3) ==
12.

After the transformation, we create this RTL:

  (insn 31 0 32 (set (reg:SI 102)
        (reg/v:SI 93 [ <retval> ])) 82 {*movsi_internal}
       (nil))

  (insn 32 31 33 (parallel [
            (set (reg:SI 101)
                (minus:SI (reg/v:SI 95 [ b ])
                    (reg/v:SI 93 [ <retval> ])))
            (clobber (reg:CC 17 flags))
        ]) 273 {*subsi_1}
       (nil))

  (insn 33 32 34 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 99)
            (const_int 0 [0]))) 3 {*cmpsi_ccno_1}
       (nil))

  (insn 34 33 0 (set (reg/v:SI 93 [ <retval> ])
        (if_then_else:SI (ne (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (reg:SI 101)
            (reg:SI 102))) 966 {*movsicc_noc}
       (nil))

That is a set to protect the "false" value, the same subtract, a comparison to
set the flags, and a conditional move. When we ask the backend to give us costs
for this it gives us COST_N_INSNS(1) for the set, COST_N_INSNS(1) for the
subtract, COST_N_INSNS(1) for the comparison, and COST_N_INSNS(2) for the
conditional move. That's a total cost of COST_N_INSNS(5) == 20 for the whole
sequence. 20 > 12, so from the perspective of the ifcvt cost model this is a
bad transformation.

Note that ifcvt is not aware that an extra set will be introduced after the
original subtract, nor does it care about the final movl %edx, %eax as that is
unconditional. I thinks it is being asked to trade test, branch, subtract for
set, subtract, test branch - when you spell it out like that it should be clear
why it makes the decision it does.

I can't treproduce your comment about -m32 - I still see branches at -Os.

[Bug target/81456] [7/8 Regression] x86-64 optimizer makes wrong decision when optimizing for size

Reply via email to