https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121957

--- Comment #6 from Alex Coplan <acoplan at gcc dot gnu.org> ---
The problem is that the `MEM_EXPR` information is wrong already at expand time.
 The RTL for bb 8 in expand is:

```
   52: NOTE_INSN_BASIC_BLOCK 8
   53: r117:DI=r96:DI-0x100       ; r117 = r96 - 256
   54: r118:V16QI=const_vector    ; r118 = {0, 0, ...}
   55: [r117:DI]=r118:V16QI       ; [r96 - 256] = {0}
   56: [r117:DI+0x10]=r118:V16QI  ; [r96 - 240] = {0}
   57: [r117:DI+0x20]=r118:V16QI  ; ...
   58: [r117:DI+0x30]=r118:V16QI
   59: [r117:DI+0x40]=r118:V16QI
   60: [r117:DI+0x50]=r118:V16QI
   61: [r117:DI+0x60]=r118:V16QI  ; ...
   62: [r117:DI+0x70]=r118:V16QI  ; [r96 - 144] = {0}
   63: r119:DI=0xffffffffffffffff ; r119 = -1
   64: [r96:DI-0x100]=r119:DI     ; [r96 - 256] = -1
   65: r120:DI=r96:DI-0x80        ; r120 = r96 - 128
   66: r121:V16QI=const_vector    ; r121 = {0, 0, ...}
   67: [r120:DI]=r121:V16QI       ; [r96 - 128] = {0}
   68: [r120:DI+0x10]=r121:V16QI  ; [r96 - 112] = {0}
   69: [r120:DI+0x20]=r121:V16QI  ; ...
   70: [r120:DI+0x30]=r121:V16QI
   71: [r120:DI+0x40]=r121:V16QI
   72: [r120:DI+0x50]=r121:V16QI
   73: [r120:DI+0x60]=r121:V16QI  ; ...
   74: [r120:DI+0x70]=r121:V16QI  ; [r96 - 16] = {0}
   75: r122:DI=0xffffffffffffffff ; r122 = -1
   76: [r96:DI-0x80]=r122:DI      ; [r96 - 128] = -1
   77: asm_operands
```

so we're dumping two V16DI vectors to the stack with identical contents: {-1,
0, ..., 0}.  Let's zoom in on the full-fat RTL for i55,i56 and i67,i68:

```
(insn 55 54 56 8 (set (mem/c:V16QI (reg:DI 117) [1 v+0 S16 A128])
        (reg:V16QI 118)) "t.c":9:5 -1
     (nil))
(insn 56 55 57 8 (set (mem/c:V16QI (plus:DI (reg:DI 117)
                (const_int 16 [0x10])) [1 v+16 S16 A128])
        (reg:V16QI 118)) "t.c":9:5 -1
     (nil))
[...]
(insn 67 66 68 8 (set (mem/c:V16QI (reg:DI 120) [1 v+0 S16 A128])
        (reg:V16QI 121)) "t.c":9:5 -1
     (nil))
(insn 68 67 69 8 (set (mem/c:V16QI (plus:DI (reg:DI 120)
                (const_int 16 [0x10])) [1 v+16 S16 A128])
        (reg:V16QI 121)) "t.c":9:5 -1
     (nil))

```

we know from the above annotated slim dump that these insns assign to the
following addresses:
 - i55: r96 - 256
 - i56: r96 - 240
 - i67: r96 - 128
 - i68: r96 - 112
but we see above that the `MEM_EXPR` information for the insns is: {v+0, v+16,
v+0, v+16}.  Thus the `MEM_EXPR` information is inconsistent, and this leads to
wrong code in `ldp_fusion1`.

Looking at the pair-fusion dump we have:

```
[bb 7] tracking insn 55 via mem expr <var_decl 0x79337792de40 v> [L=0 FP=1,
V16QImode, off=0]
[bb 7] tracking insn 56 via mem expr <var_decl 0x79337792de40 v> [L=0 FP=1,
V16QImode, off=16]
[...]
[bb 7] tracking insn 67 via mem expr <var_decl 0x79337792de40 v> [L=0 FP=1,
V16QImode, off=0]
[bb 7] tracking insn 68 via mem expr <var_decl 0x79337792de40 v> [L=0 FP=1,
V16QImode, off=16]
```

which leads to the following candidate vectors:

```
merge_pairs [L=0], cand vecs (55, 67) x (56, 68)
```

so the pass believes that i55 and i67 store to the same location, and are
adjacent accesses to i56 and i68 (which it believes also store to the same
location).  So we attempt this valid fusion:

```
fusing pair [L=0] (55,56), base=125, hazards: (-,-), move_range: (55,55)
```

and this incorrect fusion:

```
fusing pair [L=0] (57,68), base=125, hazards: (62,59), move_range: (59,59)
```

but the root cause is the inconsistent `MEM_EXPR` information which is wrong
from at least as far back as expand.

The wrong code occurs because we end up initializing only ~half of the on-stack
vectors due to incorrect stp fusion (and then fails at runtime as we branch to
abort if there were non-zero uninitialized bits on the stack).

Reply via email to