https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125293

            Bug ID: 125293
           Summary: equivalent loops, different vectorization
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: manu at gcc dot gnu.org
  Target Milestone: ---

GCC vectorizes the two for loops for the first function but not the for loop
for the second function, even though the loops are equivalent. If vectorization
is profitable at all, the single for loop is surely more profitable.

```
#include <stddef.h>

void
matrix_transpose_double(double * restrict dst, const double * restrict src,
                        const size_t nrows, const size_t ncols)
{
    if (nrows == 0 || ncols == 0)
        return;

    const size_t len_1 = (nrows * ncols) - 1;
    size_t i = 0, j = 0;
    for (; j <= len_1; i++, j += nrows)
        dst[j] = src[i];
    j -= len_1;
    while (i <= len_1) {
        for (; j <= len_1; i++, j += nrows)
            dst[j] = src[i];
        j -= len_1;
    }
}

void
matrix_transpose_double_2(double * restrict dst, const double * restrict src,
                        const size_t nrows, const size_t ncols)
{
    if (nrows == 0 || ncols == 0)
        return;

    const size_t len_1 = (nrows * ncols) - 1;
    size_t i = 0, j = 0;
    do {
        for (; j <= len_1; i++, j += nrows)
            dst[j] = src[i];
        j -= len_1;

        } while (i <= len_1);
}
```

compiled with 

gcc -o test test.c -O3 -fopt-info-vec-all -march=x86-64-v2

gives:

<source>:12:14: optimized: loop vectorized using 16 byte vectors and unroll
factor 2
<source>:15:14: missed: couldn't vectorize loop
<source>:15:14: missed: not vectorized: unsupported control flow in loop.
<source>:16:18: optimized: loop vectorized using 16 byte vectors and unroll
factor 2
<source>:15:14: missed: couldn't vectorize loop
<source>:15:14: missed: not vectorized: unsupported control flow in loop.
<source>:16:18: missed: couldn't vectorize loop
<source>:17:25: missed: not vectorized: no vectype for stmt: _14 = *_11;
 scalar_type: const double
<source>:12:14: missed: couldn't vectorize loop
<source>:13:21: missed: not vectorized: no vectype for stmt: _9 = *_6;
 scalar_type: const double
<source>:4:1: note: vectorized 2 loops in function.
<source>:7:20: note: ***** Analysis failed with vector mode V2DF
<source>:7:20: note: ***** The result for vector mode V16QI would be the same
<source>:7:20: note: ***** Re-trying analysis with vector mode V8QI
<source>:7:20: note: ***** Analysis failed with vector mode V8QI
<source>:7:20: note: ***** Re-trying analysis with vector mode V4QI
<source>:7:20: note: ***** Analysis failed with vector mode V4QI
<source>:36:13: missed: couldn't vectorize loop
<source>:36:13: missed: not vectorized: unsupported control flow in loop.
<source>:23:1: note: vectorized 0 loops in function.
<source>:26:20: note: ***** Analysis failed with vector mode V2DF
<source>:26:20: note: ***** The result for vector mode V16QI would be the same
<source>:26:20: note: ***** Re-trying analysis with vector mode V8QI
<source>:26:20: note: ***** Analysis failed with vector mode V8QI
<source>:26:20: note: ***** Re-trying analysis with vector mode V4QI
<source>:26:20: note: ***** Analysis failed with vector mode V4QI

Reply via email to