https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122747
Bug ID: 122747
Summary: Cannot fully mask a conditional reduction
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
When going through vectorizable_* to implement the reduction operation we
cannot
handle a .COND_ADD reduction because of similar reasons as in PR122723.
double foo (double *a, long *mask, int n)
{
double sum = 0.0;
for (int i = 0; i < n; ++i)
{
double val;
if (mask[i])
val = a[i];
else
val = -0.0;
sum = sum + val;
}
return sum;
}
> ./cc1 -quiet t2.c -Ofast -fopt-info-vec -march=znver4
> -fdump-tree-vect-details --param vect-partial-vector-usage=1
2.c:4:21: note: === vectorizable_call ===
t2.c:4:21: note: vect_model_simple_cost: inside_cost = 24, prologue_cost = 0
.
t2.c:4:21: missed: can't use a fully-masked loop because no conditional
operation is available.
again we do
int reduc_idx = SLP_TREE_REDUC_IDX (slp_node);
internal_fn cond_fn = get_conditional_internal_fn (ifn);
internal_fn cond_len_fn = get_len_internal_fn (ifn);
...
if (loop_vinfo
&& LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
&& (reduc_idx >= 0 || mask_opno >= 0))
{
if (reduc_idx >= 0
&& (cond_fn == IFN_LAST
|| !direct_internal_fn_supported_p (cond_fn, vectype_out,
OPTIMIZE_FOR_SPEED))
&& (cond_len_fn == IFN_LAST
|| !direct_internal_fn_supported_p (cond_len_fn, vectype_out,
OPTIMIZE_FOR_SPEED)))
and later possibly not implement proper perparing of the loop mask with
the conditional mask in the operation.