https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119091

Milan Tripkovic <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #4 from Milan Tripkovic <[email protected]> ---
In the cse1 pass (as seen in this Godbolt link:
https://godbolt.org/z/bYGbffTj9), the compiler performs early pattern
recognition that leads to suboptimal code generation. The following
transformations occur:
  Instruction Merge 1: set r140 (mv) + add const 0x101 is transformed into set
r141 const 0x1010101.
        Transformation: mv + add -> mvconst_internal
  Instruction Merge 2: set r141 + ashift const 0x20 is transformed into set
r139 const 0x101010100000000.
        Transformation: mvconst_internal (mv + add) + shift -> mvconst_internal
When these patterns are eventually processed by the split1 pass, they expand
into five instructions: MV, ADD, MV, ADD, and SHIFT. This results in
significant instruction redundancy.

We tried to fix it by disabling  mvconst_internal patern recognition in
recog.cc:insn_invalid_p for cse pass and chanes.cc:recog_level2 for fwprop pass
and it disable the patern till combine pass.

New RTL State at the start of the combine pass:
```
(note 5 0 4 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 4 5 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 4 9 2 (set (reg:DI 140)
        (const_int 16842752 [0x1010000])) "1.c":8:3 275 {*movdi_64bit})
(insn 9 8 10 2 (set (reg:DI 141)
        (plus:DI (reg:DI 140)
            (const_int 257 [0x101]))) "1.c":8:3 5 {*adddi3}
     (expr_list:REG_EQUAL (const_int 16843009 [0x1010101])))
(insn 10 9 11 2 (set (reg:DI 139)
        (ashift:DI (reg:DI 141)
            (const_int 32 [0x20]))) "1.c":8:3 297 {ashldi3}
     (expr_list:REG_EQUAL (const_int 72340172821233664 [0x101010100000000])))
(insn 11 10 14 2 (set (reg:DI 138 [ t ])
        (asm_operands:DI ("") ("=r") 0 [(reg:DI 139)] ...)))
(insn 14 11 19 2 (set (reg:DI 142 [ _2 ])
        (ior:DI (reg:DI 138 [ t ])
            (reg:DI 141))) "1.c":9:12 107 {*iordi3})
```       
By delaying the transformation, the combine pass handles the logic more
efficiently:
    It first merges insn 8 and insn 9 into a single set: (set (reg:DI 141)
(const_int 16843009 [0x1010101])).
    It then recognizes the mvconst_internal pattern (mv + shift).
    Consequently, only one mvconst_internal is generated.
Final Result:
During the split1 pass, this will expand into only three instructions (MV, ADD,
SHIFT) instead of five, successfully eliminating the redundancy.

Is this diagnosis of the root cause as premature pattern recognition in cse1
correct? If not, what direction should be taken to properly address this issue?

Reply via email to