https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121957
--- Comment #6 from Alex Coplan <acoplan at gcc dot gnu.org> --- The problem is that the `MEM_EXPR` information is wrong already at expand time. The RTL for bb 8 in expand is: ``` 52: NOTE_INSN_BASIC_BLOCK 8 53: r117:DI=r96:DI-0x100 ; r117 = r96 - 256 54: r118:V16QI=const_vector ; r118 = {0, 0, ...} 55: [r117:DI]=r118:V16QI ; [r96 - 256] = {0} 56: [r117:DI+0x10]=r118:V16QI ; [r96 - 240] = {0} 57: [r117:DI+0x20]=r118:V16QI ; ... 58: [r117:DI+0x30]=r118:V16QI 59: [r117:DI+0x40]=r118:V16QI 60: [r117:DI+0x50]=r118:V16QI 61: [r117:DI+0x60]=r118:V16QI ; ... 62: [r117:DI+0x70]=r118:V16QI ; [r96 - 144] = {0} 63: r119:DI=0xffffffffffffffff ; r119 = -1 64: [r96:DI-0x100]=r119:DI ; [r96 - 256] = -1 65: r120:DI=r96:DI-0x80 ; r120 = r96 - 128 66: r121:V16QI=const_vector ; r121 = {0, 0, ...} 67: [r120:DI]=r121:V16QI ; [r96 - 128] = {0} 68: [r120:DI+0x10]=r121:V16QI ; [r96 - 112] = {0} 69: [r120:DI+0x20]=r121:V16QI ; ... 70: [r120:DI+0x30]=r121:V16QI 71: [r120:DI+0x40]=r121:V16QI 72: [r120:DI+0x50]=r121:V16QI 73: [r120:DI+0x60]=r121:V16QI ; ... 74: [r120:DI+0x70]=r121:V16QI ; [r96 - 16] = {0} 75: r122:DI=0xffffffffffffffff ; r122 = -1 76: [r96:DI-0x80]=r122:DI ; [r96 - 128] = -1 77: asm_operands ``` so we're dumping two V16DI vectors to the stack with identical contents: {-1, 0, ..., 0}. Let's zoom in on the full-fat RTL for i55,i56 and i67,i68: ``` (insn 55 54 56 8 (set (mem/c:V16QI (reg:DI 117) [1 v+0 S16 A128]) (reg:V16QI 118)) "t.c":9:5 -1 (nil)) (insn 56 55 57 8 (set (mem/c:V16QI (plus:DI (reg:DI 117) (const_int 16 [0x10])) [1 v+16 S16 A128]) (reg:V16QI 118)) "t.c":9:5 -1 (nil)) [...] (insn 67 66 68 8 (set (mem/c:V16QI (reg:DI 120) [1 v+0 S16 A128]) (reg:V16QI 121)) "t.c":9:5 -1 (nil)) (insn 68 67 69 8 (set (mem/c:V16QI (plus:DI (reg:DI 120) (const_int 16 [0x10])) [1 v+16 S16 A128]) (reg:V16QI 121)) "t.c":9:5 -1 (nil)) ``` we know from the above annotated slim dump that these insns assign to the following addresses: - i55: r96 - 256 - i56: r96 - 240 - i67: r96 - 128 - i68: r96 - 112 but we see above that the `MEM_EXPR` information for the insns is: {v+0, v+16, v+0, v+16}. Thus the `MEM_EXPR` information is inconsistent, and this leads to wrong code in `ldp_fusion1`. Looking at the pair-fusion dump we have: ``` [bb 7] tracking insn 55 via mem expr <var_decl 0x79337792de40 v> [L=0 FP=1, V16QImode, off=0] [bb 7] tracking insn 56 via mem expr <var_decl 0x79337792de40 v> [L=0 FP=1, V16QImode, off=16] [...] [bb 7] tracking insn 67 via mem expr <var_decl 0x79337792de40 v> [L=0 FP=1, V16QImode, off=0] [bb 7] tracking insn 68 via mem expr <var_decl 0x79337792de40 v> [L=0 FP=1, V16QImode, off=16] ``` which leads to the following candidate vectors: ``` merge_pairs [L=0], cand vecs (55, 67) x (56, 68) ``` so the pass believes that i55 and i67 store to the same location, and are adjacent accesses to i56 and i68 (which it believes also store to the same location). So we attempt this valid fusion: ``` fusing pair [L=0] (55,56), base=125, hazards: (-,-), move_range: (55,55) ``` and this incorrect fusion: ``` fusing pair [L=0] (57,68), base=125, hazards: (62,59), move_range: (59,59) ``` but the root cause is the inconsistent `MEM_EXPR` information which is wrong from at least as far back as expand. The wrong code occurs because we end up initializing only ~half of the on-stack vectors due to incorrect stp fusion (and then fails at runtime as we branch to abort if there were non-zero uninitialized bits on the stack).