https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125547

            Bug ID: 125547
           Summary: Missed devirtualisation through std::function
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org
  Target Milestone: ---

The following C++ testcase:
#include <functional>
#include <cstddef>
struct V4 { float v[4]; };
static float plus_op(float a, float b) { return a + b; }

static inline V4 loop4_stdfunc(const std::function<float(float,float)> &f,
                               const V4 &x, const V4 &y) {
  V4 out;
  for (int i = 0; i < 4; ++i) out.v[i] = f(x.v[i], y.v[i]);
  return out;
}
template <class F>
static inline V4 loop4_tmpl(F f, const V4 &x, const V4 &y) {
  V4 out;
  for (int i = 0; i < 4; ++i) out.v[i] = f(x.v[i], y.v[i]);
  return out;
}
void add_stdfunction(V4 *__restrict o, const V4 *__restrict a,
                     const V4 *__restrict b, size_t n) {
  for (size_t i = 0; i < n; ++i) o[i] = loop4_stdfunc(plus_op, a[i], b[i]);
}
void add_template(V4 *__restrict o, const V4 *__restrict a,
                  const V4 *__restrict b, size_t n) {
  for (size_t i = 0; i < n; ++i) o[i] = loop4_tmpl(plus_op, a[i], b[i]);
}

devirtualises with LLVM and ends up generating a vectorised form for
add_stdfunction but GCC fails and generates an indirect branch instead:
https://godbolt.org/z/h7bKzM7W9

This ends up costing us a lot of performance on the 772.marian_r benchmark from
SPEC2026 which uses these std::function abstractions a lot to simulate vectors

Reply via email to