https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88540
Bug ID: 88540
Summary: Issues with vectorization of min/max operations
Product: gcc
Version: 8.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: [email protected]
Target Milestone: ---
1st issue:
[code]
#define SIZE 2
void test(double* __restrict d1, double* __restrict d2, double* __restrict d3)
{
for (int n = 0; n < SIZE; ++n)
{
d3[n] = d1[n] < d2[n] ? d1[n] : d2[n];
}
}
[code]
When this is compiled with for SSE2, gcc produces non vectorized code:
[asm]
test(double*, double*, double*):
vmovsd xmm0, QWORD PTR [rdi]
vminsd xmm0, xmm0, QWORD PTR [rsi]
vmovsd QWORD PTR [rdx], xmm0
vmovsd xmm0, QWORD PTR [rdi+8]
vminsd xmm0, xmm0, QWORD PTR [rsi+8]
vmovsd QWORD PTR [rdx+8], xmm0
ret
[/asm]
When SIZE is changed to 3 or greater, code gets vectorized properly. I thought
that this may be some workaround for old CPU which was slower there, but this
also happen when compiling with "-O3 -march=skylake". I also checked with SIZE
6, and got 1 AVX op and 2 scalar SSE ones. Looks that this is an off-by-one
bug.
The same happen for code with other relational operators (>, <=, >=).
2nd issue: when compiling for AVX512, gcc does not use new instructions which
use ZMM registers, it still generates code for YMM ones.