https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322
--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Mathieu Malaterre from comment #9) > Technically I can also execute the `uint16` portion of the unit test and > produce a failure (so this seems to be consistent behavior with signed > counterpart): > > ``` > HWY_NOINLINE void TestAllMulHigh() { > ForPartialVectors<TestMulHigh> test; > // test(int16_t()); > test(uint16_t()); > } As this is a runtime failure, you will have to provide a (minimized) runtime testcase. I took a quick look at the sources and it looks to me that the following procedure can obtain a testcase: Use tests/mul_tests.cc and strip out as much lines as possible. Above the part that you show are several tests. Please find out which test fails. As can be seen from the test run, the failure is in the 128bit emulation part. These operations are in hwy/ops/emu128-inl.h, specifically: --cut here-- HWY_API Vec128<uint16_t, N> MulHigh(Vec128<uint16_t, N> a, const Vec128<uint16_t, N> b) { for (size_t i = 0; i < N; ++i) { // Cast to uint32_t first to prevent overflow. Otherwise the result of // uint16_t * uint16_t is in "int" which may overflow. In practice the // result is the same but this way it is also defined. a.raw[i] = static_cast<uint16_t>( (static_cast<uint32_t>(a.raw[i]) * static_cast<uint32_t>(b.raw[i])) >> 16); } return a; } --cut here-- Put everything together in one file, check if it still fails, and you have a testcase. If it is possible, simplify it as much as possible and if you can convert it to a plain C, the testcase will be much easier to analyse. The reason the test fails with gcc-12 is that gcc-12 enabled auto-vectorisation for -O2. The failure suggests there are some issues with the vectorisation of the above code, or perhaps with the preparation of test values before the loop.