https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125648
Bug ID: 125648
Summary: std::simd::partial_load produces incorrect result
Product: gcc
Version: 16.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: peter at wildp dot net
Target Milestone: ---
When calling a non-mask overload of `std::simd::partial_load` at run time to
load exactly 4 elements of type char, the result is incorrect when compiled
with x86-64, x86-64-v2, or x86-64-v3.
Here is a test case which fails when compiled with -std=c++26 -O0
-march=x86-64:
```c++
#include <simd>
#include <cassert>
int main() {
constexpr const char* cstr{ "abcd" };
constexpr auto v1 = std::simd::partial_load<std::simd::vec<char>>(cstr, 4);
auto v2 = std::simd::partial_load<std::simd::vec<char>>(cstr, 4);
assert(all_of(v1 == v2));
}
```
Here, v2 should be equal to ['a', 'b', 'c', 'd', '\0' (x12) ],
but is actually equal to ['a', 'b', '\0', 'd', '\0' (x12) ].
When `partial_load` with the same arguments is invoked at compile time, the
correct result is produced, hence v1 != v2 above. This issue only appears when
4 elements are loaded; for all other sizes the assertion holds.
I've traced this issue back to `__memcpy_chunks` in simd_details.h:
I suspect the `1`s on lines 1343 and 1344 should actually be `2`s, although
there are probably other ways of fixing this too.