https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125547
Bug ID: 125547
Summary: Missed devirtualisation through std::function
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ipa
Assignee: unassigned at gcc dot gnu.org
Reporter: ktkachov at gcc dot gnu.org
CC: hubicka at gcc dot gnu.org
Target Milestone: ---
The following C++ testcase:
#include <functional>
#include <cstddef>
struct V4 { float v[4]; };
static float plus_op(float a, float b) { return a + b; }
static inline V4 loop4_stdfunc(const std::function<float(float,float)> &f,
const V4 &x, const V4 &y) {
V4 out;
for (int i = 0; i < 4; ++i) out.v[i] = f(x.v[i], y.v[i]);
return out;
}
template <class F>
static inline V4 loop4_tmpl(F f, const V4 &x, const V4 &y) {
V4 out;
for (int i = 0; i < 4; ++i) out.v[i] = f(x.v[i], y.v[i]);
return out;
}
void add_stdfunction(V4 *__restrict o, const V4 *__restrict a,
const V4 *__restrict b, size_t n) {
for (size_t i = 0; i < n; ++i) o[i] = loop4_stdfunc(plus_op, a[i], b[i]);
}
void add_template(V4 *__restrict o, const V4 *__restrict a,
const V4 *__restrict b, size_t n) {
for (size_t i = 0; i < n; ++i) o[i] = loop4_tmpl(plus_op, a[i], b[i]);
}
devirtualises with LLVM and ends up generating a vectorised form for
add_stdfunction but GCC fails and generates an indirect branch instead:
https://godbolt.org/z/h7bKzM7W9
This ends up costing us a lot of performance on the 772.marian_r benchmark from
SPEC2026 which uses these std::function abstractions a lot to simulate vectors