https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90460
Bug ID: 90460
Summary: Inefficient vector construction from pieces
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
Split out from PR90424
template <class T>
using V [[gnu::vector_size(16)]] = T;
template <class T, unsigned... I>
V<T> load(const void *p) {
const T* q = static_cast<const T*>(p);
V<T> r = {q[I]...};
return r;
}
// movq or movsd
template V<char > load<char , 0,1,2,3,4,5,6,7>(const void *);
template V<short > load<short , 0,1,2,3>(const void *);
template V<int > load<int , 0,1>(const void *);
template V<long > load<long , 0>(const void *);
template V<float > load<float , 0,1>(const void *);
template V<double> load<double, 0>(const void *);
// movd or movss
template V<char > load<char , 0,1,2,3>(const void *);
template V<short> load<short, 0,1>(const void *);
template V<int > load<int , 0>(const void *);
template V<float> load<float, 0>(const void *);
ends up with IL like
load<int, 0, 1> (const void * p)
{
V r;
int _1;
int _2;
<bb 2> [local count: 1073741824]:
_1 = MEM[(const int *)p_3(D)];
_2 = MEM[(const int *)p_3(D) + 4B];
r_5 = {_1, _2};
return r_5;
which looks like a job for bswap.