[Bug tree-optimization/122219] Missed store sinking when using memcpy with vector_size type in inlined functions

pfustc at gcc dot gnu.org via Gcc-bugs Fri, 17 Oct 2025 21:40:17 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122219


--- Comment #5 from Pengfei Li <pfustc at gcc dot gnu.org> ---
I might have reduced it too far. The original code was written with x86
intrinsics and compiled with the SIMDe (SIMD everywhere) library. A more
original case (like below) doesn't have undefined data.

#define SIMDE_ENABLE_NATIVE_ALIASES

#include "simde/x86/avx2.h"

void foo(__m256& v, unsigned int n) {
  __m128 f0 = {1.0f, 2.0f, 3.0f, 4.0f};
  __m128 f1 = {5.0f, 6.0f, 7.0f, 8.0f};
  for (int i = 0; i < n; i++) {
    f0 = f0 + f0;
    f1 = f1 + f1;
    v = _mm256_castps128_ps256(f0);
    v = _mm256_insertf128_ps(v, f1, 1);
  }
}

Even with -fstack-reuse=none, the stores don’t sink.

I acknowledge the workload code isn’t well written. We can either move the
assignments to v out or use _mm256_set_m128 instead of the cast + insert.
However, we also observed that even for this case LLVM can sink the stores. So
perhaps there's still room for optimization.

[Bug tree-optimization/122219] Missed store sinking when using memcpy with vector_size type in inlined functions

Reply via email to