https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #7 from vries at gcc dot gnu.org ---
(In reply to Richard Biener from comment #6)
> I thought that parallelizing vectorized loops is harder (you eventually get
> extra prologue and epliogue loops, etc).

Another example, par-4.c:
...
int __attribute__((noinline,noclone))
f (int argc, double *__restrict results, double *__restrict data, INDEX_TYPE n)
{
  double coeff = 12.2;

  for (INDEX_TYPE idx = 0; idx < n; idx++)
    results[idx] = coeff * data[idx];

  return !(results[argc] == 0.0);
}

#define nEvents 1000

#if defined (MAIN)
int
main (int argc)
{
  double results[nEvents] = {0};
  double data[nEvents] = {0};

  return f (argc, results, data, nEvents);
}
#endif
...

When not parallelizing, we vectorize without problems:
...
parloops_factor: 0, index_type: int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
  vectorized: 1, parallelized: 0
...


When parallelizing, we generate both a low iteration count loop, and a
split-off parallelized loop. The vectorizer vectorizes both loops (each of
which contains an epilogue):
...
parloops_factor: 2, index_type: int:
  vectorized: 2, parallelized: 1
parloops_factor: 2, index_type: long:
  vectorized: 2, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
  vectorized: 2, parallelized: 1
...

Except in the case of unsigned int, in which case it only vectorizes the low
iteration count loop:
...
parloops_factor: 2, index_type: unsigned int:
  vectorized: 1, parallelized: 1
...
The other loop fails to vectorize in a fashion similar as decribed for par-2.c
with INDEX_TYPE (unsigned) int.

Reply via email to