https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125648

            Bug ID: 125648
           Summary: std::simd::partial_load produces incorrect result
           Product: gcc
           Version: 16.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at wildp dot net
  Target Milestone: ---

When calling a non-mask overload of `std::simd::partial_load` at run time to
load exactly 4 elements of type char, the result is incorrect when compiled
with x86-64, x86-64-v2, or x86-64-v3.

Here is a test case which fails when compiled with -std=c++26 -O0
-march=x86-64:
```c++
#include <simd>
#include <cassert>

int main() {
  constexpr const char* cstr{ "abcd" };
  constexpr auto v1 = std::simd::partial_load<std::simd::vec<char>>(cstr, 4);
  auto v2 = std::simd::partial_load<std::simd::vec<char>>(cstr, 4);
  assert(all_of(v1 == v2));
}
```

Here, v2 should be equal to ['a', 'b', 'c', 'd', '\0' (x12) ],
but is actually equal to ['a', 'b', '\0', 'd', '\0' (x12) ].

When `partial_load` with the same arguments is invoked at compile time, the
correct result is produced, hence v1 != v2 above. This issue only appears when
4 elements are loaded; for all other sizes the assertion holds.

I've traced this issue back to `__memcpy_chunks` in simd_details.h:
I suspect the `1`s on lines 1343 and 1344 should actually be `2`s, although
there are probably other ways of fixing this too.

Reply via email to