https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122249
Bug ID: 122249
Summary: Generic vectors sometimes leaves a store inside a loop
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Blocks: 122219
Target Milestone: ---
Target: aarch64-linux-gnu
Take:
```
typedef float __m256 __attribute__((vector_size(8*sizeof(float))));
void foo(__m256& v, unsigned int n) {
__m256 f2 = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f};
for (int i = 0; i < n; i++) {
f2 = f2 + f2;
v = f2;
}
}
````
On aarch64 at -O2 we get:
```
.L3:
add w2, w2, 1
ldp q31, q30, [sp, 32]
fadd v31.4s, v31.4s, v31.4s
fadd v30.4s, v30.4s, v30.4s
stp q31, q30, [sp, 32]
cmp w1, w2
bne .L3
```
```
<bb 4> [local count: 955630224]:
# f2_15 = PHI <f2_7(4), { 1.0e+0, 2.0e+0, 3.0e+0, 4.0e+0, 5.0e+0, 6.0e+0,
7.0e+0, 8.0e+0 }(3)>
# i_16 = PHI <i_10(4), 0(3)>
_14 = BIT_FIELD_REF <f2_15, 128, 0>;
_1 = _14 + _14;
_3 = BIT_FIELD_REF <f2_15, 128, 128>;
_11 = _3 + _3;
f2_7 = {_1, _11};
i_10 = i_16 + 1;
if (_9 != i_10)
goto <bb 4>; [89.00%]
else
goto <bb 5>; [11.00%]
```
For the inner loop.
The store there is the store to f2.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122219
[Bug 122219] Missed store sinking when using memcpy with vector_size type in
inlined functions