https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61810
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #8 from Alexander Monakov <amonakov at gcc dot gnu.org> --- (In reply to Richard Biener from comment #7) > But it looks like the testcase is broken: > > __attribute__((always_inline, target("avx2"))) > static __m256i > load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride) > { > __m128i src01, src23; > src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride)); > src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1); > return _mm256_setr_m128i(src01, src23); > } > > it seems to expect that src23 is zero before inserting the data? If you look in the original PR 104441 testcase, it has sensible code: static __m256i __attribute__((always_inline)) load8bit_4x4_avx2(const uint8_t *const src, const uint32_t stride) { __m128i src01, src23; src01 = _mm_cvtsi32_si128(*(int32_t*)(src + 0 * stride)); src01 = _mm_insert_epi32(src01, *(int32_t *)(src + 1 * stride), 1); src23 = _mm_cvtsi32_si128(*(int32_t*)(src + 2 * stride)); src23 = _mm_insert_epi32(src23, *(int32_t *)(src + 3 * stride), 1); return _mm256_setr_m128i(src01, src23); }