https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122747

            Bug ID: 122747
           Summary: Cannot fully mask a conditional reduction
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When going through vectorizable_* to implement the reduction operation we
cannot
handle a .COND_ADD reduction because of similar reasons as in PR122723.

double foo (double *a, long *mask, int n)
{
  double sum = 0.0;
  for (int i = 0; i < n; ++i)
    {
      double val;
      if (mask[i])
        val = a[i];
      else
        val = -0.0;
      sum = sum + val;
    }
  return sum;
}

> ./cc1 -quiet t2.c -Ofast -fopt-info-vec -march=znver4 
> -fdump-tree-vect-details --param vect-partial-vector-usage=1

2.c:4:21: note:    === vectorizable_call ===
t2.c:4:21: note:    vect_model_simple_cost: inside_cost = 24, prologue_cost = 0
.
t2.c:4:21: missed:    can't use a fully-masked loop because no conditional
operation is available.

again we do

  int reduc_idx = SLP_TREE_REDUC_IDX (slp_node);
  internal_fn cond_fn = get_conditional_internal_fn (ifn);
  internal_fn cond_len_fn = get_len_internal_fn (ifn);
...
      if (loop_vinfo
          && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
          && (reduc_idx >= 0 || mask_opno >= 0))
        {
          if (reduc_idx >= 0
              && (cond_fn == IFN_LAST
                  || !direct_internal_fn_supported_p (cond_fn, vectype_out,
                                                      OPTIMIZE_FOR_SPEED))
              && (cond_len_fn == IFN_LAST
                  || !direct_internal_fn_supported_p (cond_len_fn, vectype_out,
                                                      OPTIMIZE_FOR_SPEED)))

and later possibly not implement proper perparing of the loop mask with
the conditional mask in the operation.

Reply via email to