https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106322

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Mathieu Malaterre from comment #9)

> Technically I can also execute the `uint16` portion of the unit test and
> produce a failure (so this seems to be consistent behavior with signed
> counterpart):
> 
> ```
> HWY_NOINLINE void TestAllMulHigh() {
>   ForPartialVectors<TestMulHigh> test;
> //  test(int16_t());
>   test(uint16_t());
> }


As this is a runtime failure, you will have to provide a (minimized) runtime
testcase. I took a quick look at the sources and it looks to me that the
following procedure can obtain a testcase:

Use tests/mul_tests.cc and strip out as much lines as possible. Above the part
that you show are several tests. Please find out which test fails.

As can be seen from the test run, the failure is in the 128bit emulation part.
These operations are in hwy/ops/emu128-inl.h, specifically:

--cut here--
HWY_API Vec128<uint16_t, N> MulHigh(Vec128<uint16_t, N> a,
                                    const Vec128<uint16_t, N> b) {
  for (size_t i = 0; i < N; ++i) {
    // Cast to uint32_t first to prevent overflow. Otherwise the result of
    // uint16_t * uint16_t is in "int" which may overflow. In practice the
    // result is the same but this way it is also defined.
    a.raw[i] = static_cast<uint16_t>(
        (static_cast<uint32_t>(a.raw[i]) * static_cast<uint32_t>(b.raw[i])) >>
        16);
  }
  return a;
}
--cut here--

Put everything together in one file, check if it still fails, and you have a
testcase. If it is possible, simplify it as much as possible and if you can
convert it to a plain C, the testcase will be much easier to analyse.

The reason the test fails with gcc-12 is that gcc-12 enabled auto-vectorisation
for -O2. The failure suggests there are some issues with the vectorisation of
the above code, or perhaps with the preparation of test values before the loop.

Reply via email to