https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>:

https://gcc.gnu.org/g:31cc9824d1cd5e0f7fa145d0831a923479333cd6

commit r14-5013-g31cc9824d1cd5e0f7fa145d0831a923479333cd6
Author: Roger Sayle <ro...@nextmovesoftware.com>
Date:   Mon Oct 30 16:17:42 2023 +0000

    ARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

    This patch overhauls the ARC backend's insn_cost target hook, and makes
    some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
    goal is to allow the backend to indicate that shifts and rotates are
    slow (discouraged) when the CPU doesn't have a barrel shifter. I should
    also acknowledge Richard Sandiford for inspiring the use of set_cost
    in this rewrite of arc_insn_cost; this implementation borrows heavily
    for the target hooks for AArch64 and ARM.

    The motivating example is derived from PR rtl-optimization/110717.

    struct S { int a : 5; };
    unsigned int foo (struct S *p) {
      return p->a;
    }

    With a barrel shifter, GCC -O2 generates the reasonable:

    foo:    ldb_s   r0,[r0]
            asl_s   r0,r0,27
            j_s.d   [blink]
            asr_s   r0,r0,27

    What's interesting is that during combine, the middle-end actually
    has two shifts by three bits, and a sign-extension from QI to SI.

    Trying 8, 9 -> 11:
        8: r158:SI=r157:QI#0<<0x3
          REG_DEAD r157:QI
        9: r159:SI=sign_extend(r158:SI#0)
          REG_DEAD r158:SI
       11: r155:SI=r159:SI>>0x3
          REG_DEAD r159:SI

    Whilst it's reasonable to simplify this to two shifts by 27 bits when
    the CPU has a barrel shifter, it's actually a significant pessimization
    when these shifts are implemented by loops.  This combination can be
    prevented if the backend provides accurate-ish estimates for insn_cost.

    Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:

    foo:    ldb_s   r0,[r0]
            mov     lp_count,27
            lp      2f
            add     r0,r0,r0
            nop
    2:      # end single insn loop
            mov     lp_count,27
            lp      2f
            asr     r0,r0
            nop
    2:      # end single insn loop
            j_s     [blink]

    which contains two loops and requires about ~113 cycles to execute.
    With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:

    foo:    ldb_s   r0,[r0]
            mov_s   r2,0    ;3
            add3    r0,r2,r0
            sexb_s  r0,r0
            asr_s   r0,r0
            asr_s   r0,r0
            j_s.d   [blink]
            asr_s   r0,r0

    which requires only ~6 cycles, for the shorter shifts by 3 and sign
    extension.

    2023-10-30  Roger Sayle  <ro...@nextmovesoftware.com>

    gcc/ChangeLog
            * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
            Provide reasonable values for SHIFTS and ROTATES by constant
            bit counts depending upon TARGET_BARREL_SHIFTER.
            (arc_insn_cost): Use insn attributes if the instruction is
            recognized.  Avoid calling get_attr_length for type "multi",
            i.e. define_insn_and_split patterns without explicit type.
            Fall-back to set_rtx_cost for single_set and pattern_cost
            otherwise.
            * config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
            (BRANCH_COST): Improve/correct definition.
            (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.

Reply via email to