https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122790

            Bug ID: 122790
           Summary: conditional OMP SIMD clone does not handle large
                    simdlen properly
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

#pragma omp declare simd simdlen(32) inbranch
int __attribute__((const)) baz (int x);

short a[1024];

void __attribute__((noipa))
foo (int n, int * __restrict b)
{
  for (int i = 0; i < n; ++i)
    if (a[i])
      b[i] = baz (b[i]);
}

ICEs with

> ./cc1 -quiet t.c -O3 -mavx512bw -fopt-info-vec -fopenmp-simd --param 
> vect-partial-vector-usage=2
t.c:9:21: optimized: loop vectorized using masked 64 byte vectors and unroll
factor 32
during GIMPLE pass: vect
t.c: In function ‘foo’:
t.c:7:1: internal compiler error: in exact_div, at poly-int.h:2179
    7 | foo (int n, int * __restrict b)
      | ^~~
0x37eeeba internal_error(char const*, ...)
        ../../src/gcc/gcc/diagnostic-global-context.cc:787

there's mismatch in 'ncopies' we use for querying the mask argument and also
the position used.  'ncopies' is the number of SIMD clone calls we emit.

For AVX512 masking

                  atype = bestn->simdclone->args[i].vector_type;
                  /* Guess the number of lanes represented by atype.  */
                  poly_uint64 atype_subparts
                    = exact_div (bestn->simdclone->simdlen,
                                 num_mask_args);
                  o = vector_unroll_factor (nunits, atype_subparts);

is wrong.  I do not see any info in the simdclone info as to which
"multiplicity" there is on arguments, in particular the mask argument.
For a vector data type we can use vector_type and simdlen, but for
SIMD_CLONE_ARG_TYPE_MASK there is not enough info(?).

Reply via email to