https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122249

            Bug ID: 122249
           Summary: Generic vectors sometimes leaves a store inside a loop
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
            Blocks: 122219
  Target Milestone: ---
            Target: aarch64-linux-gnu

Take:
```
typedef float __m256 __attribute__((vector_size(8*sizeof(float))));

void foo(__m256& v, unsigned int n) {
  __m256 f2 = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f};
  for (int i = 0; i < n; i++) {
    f2 = f2 + f2;
    v = f2;
  }
}
````

On aarch64 at -O2 we get:
```
.L3:
        add     w2, w2, 1
        ldp     q31, q30, [sp, 32]
        fadd    v31.4s, v31.4s, v31.4s
        fadd    v30.4s, v30.4s, v30.4s
        stp     q31, q30, [sp, 32]
        cmp     w1, w2
        bne     .L3
```
```
  <bb 4> [local count: 955630224]:
  # f2_15 = PHI <f2_7(4), { 1.0e+0, 2.0e+0, 3.0e+0, 4.0e+0, 5.0e+0, 6.0e+0,
7.0e+0, 8.0e+0 }(3)>
  # i_16 = PHI <i_10(4), 0(3)>
  _14 = BIT_FIELD_REF <f2_15, 128, 0>;
  _1 = _14 + _14;
  _3 = BIT_FIELD_REF <f2_15, 128, 128>;
  _11 = _3 + _3;
  f2_7 = {_1, _11};
  i_10 = i_16 + 1;
  if (_9 != i_10)
    goto <bb 4>; [89.00%]
  else
    goto <bb 5>; [11.00%]
```

For the inner loop.

The store there is the store to f2.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122219
[Bug 122219] Missed store sinking when using memcpy with vector_size type in
inlined functions

Reply via email to