https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122723

            Bug ID: 122723
           Summary: Oddities around mask support with .COND_ADD reductions
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

double foo (double *a, char *mask, int n)
{
  double sum = 0.0;
  for (int i = 0; i < n; ++i)
    {
      double val;
      if (mask[i])
        val = a[i];
      else
        val = -0.0;
      sum = sum + val;
    }
  return sum;
}

with -Ofast -march=znver4 we get

t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor 64

and no vector epilog.  With -O3 -march=znver4 instead

t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor 64
t.c:4:21: optimized: epilogue loop vectorized using masked 64 byte vectors and
unroll factor 64

The former is due to

t.c:4:21: note:   using single def-use cycle for reduction by reducing multiple
vectors to one in the loop body 
vect_model_reduction_cost: inside_cost = 0, prologue_cost = 8, epilogue_cost =
32 .
t.c:4:21: missed:   can't operate on partial vectors because no conditional
operation is available.

That is vect_reduction_update_partial_vector_usage at work which get's
.COND_ADD as 'code' and then things go downhill.

Reply via email to