https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112384
Bug ID: 112384
Summary: a non-constant vec dup should be improved
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
Take:
```
#define vector __attribute__((vector_size(16)))
vector int f1(vector int t, int i)
{
i&=3;
vector int tt = {i, i, i, i};
vector int r = __builtin_shuffle(t, tt);
return r;
}
vector int f2(vector int t, int i)
{
i&=3;
i = t[i];
vector int tt = {i, i, i, i};
return tt;
}
```
Both of these give not so good code generation.
f1 has:
```
dup v31.4s, w0
...
shl v31.4s, v31.4s, 2
tbl v31.16b, {v31.16b}, v28.16b
add v31.16b, v31.16b, v29.16b
```
But we could do better by combing the dup and the shl into.
For RTL level:
Trying 11 -> 12:
11: r98:V4SI=vec_duplicate(r92:SI)
REG_DEAD r92:SI
12: r101:V4SI=r98:V4SI<<const_vector
REG_DEAD r98:V4SI
Failed to match this instruction:
(set (reg:V4SI 101)
(ashift:V4SI (vec_duplicate:V4SI (reg/v:SI 92 [ iD.4390 ]))
(const_vector:V4SI [
(const_int 2 [0x2]) repeated x4
])))
Changing that into:
(set (reg:V4SI 101)
(vec_duplicate:V4SI (ashift:SI (reg/v:SI 92 [ iD.4390 ]) (const_int 2 [0x2])))
Will improve things.
The first tlb seems can be removable too.