https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565
--- Comment #14 from Richard Earnshaw <rearnsha at gcc dot gnu.org> --- With the simpler test case we see Breakpoint 1, try_combine (i3=0x7ffff64d33c0, i2=0x7ffff64d3380, i1=0x0, i0=0x0, new_direct_jump_p=0x7fffffffd850, last_combined_insn=0x7ffff64d33c0) at /home/rearnsha/gnusrc/gcc-cross/master/gcc/combine.c:2671 2671 { (nil) (nil) (insn 7 4 8 2 (set (reg/v:SI 96 [ a ]) (and:SI (reg:SI 104) (const_int 14 [0xe]))) "/tmp/t2.c":3:7 535 {andsi3} (expr_list:REG_DEAD (reg:SI 104) (nil))) (insn 8 7 10 2 (set (reg:DI 99 [ a ]) (sign_extend:DI (reg/v:SI 96 [ a ]))) "/tmp/t2.c":4:13 106 {*extendsidi2_aarch64} (nil)) And then the resulting insn that we try is (parallel [ (set (reg:DI 99 [ a ]) (and:DI (subreg:DI (reg:SI 104) 0) (const_int 14 [0xe]))) (set (reg/v:SI 96 [ a ]) (and:SI (reg:SI 104) (const_int 14 [0xe]))) ]) This insn doesn't match, and so we try to break it into two set insn and try those individually. But that gives us back insn 7 again and then a new insn based on the (now extended lifetime) of r104. It seems to me that if we are doing this sort of transformation, then it's only likely to be profitable if the cost of the really new insn is strictly cheaper than what we have before. Being the same cost is not enough in this case.