https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65419

--- Comment #4 from vries at gcc dot gnu.org ---
When removing the fn spec from GOACC_data_start, we run into the problem that
this example doesn't get parallelized anymore:
...
#include <stdlib.h>

#define N (1024 * 512)
#define COUNTERTYPE unsigned int

int
main (void)
{
  unsigned int *__restrict a;
  unsigned int *__restrict b;
  unsigned int *__restrict c;

  a = (unsigned int *)malloc (N * sizeof (unsigned int));
  b = (unsigned int *)malloc (N * sizeof (unsigned int));
  c = (unsigned int *)malloc (N * sizeof (unsigned int));

  for (COUNTERTYPE i = 0; i < N; i++)
    a[i] = i * 2;

  for (COUNTERTYPE i = 0; i < N; i++)
    b[i] = i * 4;

#pragma acc data copyin (a[0:N], b[0:N]) copyout (c[0:N])
  {
#pragma acc kernels present (a[0:N], b[0:N], c[0:N])
    {
      for (COUNTERTYPE ii = 0; ii < N; ii++)
        c[ii] = a[ii] + b[ii];
    }
  }

  for (COUNTERTYPE i = 0; i < N; i++)
    if (c[i] != a[i] + b[i])
      abort ();

  free (a);
  free (b);
  free (c);

  return 0;
}
...

In this sequence, we take the address of a and pass it to GOACC_data_start:
...
  .omp_data_arr.18.a = &a;
  __builtin_GOACC_data_start (-1, 6, &.omp_data_arr.18, &.omp_data_sizes.19,
&.omp_data_kinds.20);
...

With the fnspec, we need to assume that a could be modified by
__builtin_GOACC_data_start. And that inhibits optimization.

Reply via email to