On Mon, Oct 13, 2025 at 9:22 AM Eric Botcazou <[email protected]> wrote:
>
> Hi,
>
> the attached Ada testcase compiled for x86_64-w64-mingw32 with -O3 -msse2 -
> gnatn exposes an instability in the SLP reduction detection: in 32-bit mode,
> the inner loop is effectively vectorized (after being unrolled by cunrolli):
>
> r1.adb:6:9: optimized: loop vectorized using 16 byte vectors and unroll factor
> 1
>
> whereas it is not in 64-bit mode:
>
> r1.adb:6:9: missed: couldn't vectorize loop
>
> despite almost similar .ifcvt tree dumps. The difference stems from this:
>
> <bb 3> [local count: 268435456]:
> # i_1 = PHI <0(5), i_8(6)>
> # ret$F$1_17 = PHI <ret$F$1_31(5), _27(6)>
> # ret$F$2_30 = PHI <ret$F$2_32(5), _35(6)>
> # ret$F$3_21 = PHI <ret$F$3_37(5), _43(6)>
> # ret$F$4_24 = PHI <ret$F$4_39(5), _16(6)>
> i_8 = i_1 + 1;
> _27 = ret$F$1_17 + ret$F$1_31;
> _35 = ret$F$2_30 + ret$F$2_32;
> _43 = ret$F$3_21 + ret$F$3_37;
> _16 = ret$F$4_24 + ret$F$4_39;
>
> in the 32-bit .ifcvt tree dump and this:
>
> <bb 3> [local count: 268435456]:
> # i_1 = PHI <0(5), i_9(6)>
> # ret$F$1_31 = PHI <ret$F$1_20(5), _38(6)>
> # ret$F$2_29 = PHI <ret$F$2_19(5), _46(6)>
> # ret$F$3_21 = PHI <ret$F$3_49(5), _54(6)>
> # ret$F$4_47 = PHI <ret$F$4_41(5), _18(6)>
> i_9 = i_1 + 1;
> _38 = ret$F$1_20 + ret$F$1_31;
> _46 = ret$F$2_19 + ret$F$2_29;
> _54 = ret$F$3_21 + ret$F$3_49;
> _18 = ret$F$4_41 + ret$F$4_47;
>
> in the 64-bit .ifcvt tree dump. The first pattern is accepted as a reduction
> by vect_build_slp_tree_1 but the second is not:
>
> r1.adb:6:9: note: Build SLP for _18 = ret$F$4_41 + ret$F$4_47;
> r1.adb:6:9: note: Build SLP for _54 = ret$F$3_21 + ret$F$3_49;
> r1.adb:6:9: missed: Build SLP failed: different reduc_idx 0 instead of 1 in
> _54 = ret$F$3_21 + ret$F$3_49;
> r1.adb:6:9: note: Build SLP for _46 = ret$F$2_19 + ret$F$2_29;
> r1.adb:6:9: note: Build SLP for _38 = ret$F$1_20 + ret$F$1_31;
>
> which indicates that, if the operands of the addition had been swapped in:
>
> _54 = ret$F$3_21 + ret$F$3_49;
>
> then the pattern would have been recognized.
>
> vect_build_slp_tree_1 already deals with this kind of issues for COND_EXPR:
>
> Note COND_EXPR is possibly isomorphic to another one after swapping its
> operands. Set SWAP[i] to 1 if stmt I is COND_EXPR and isomorphic to
> the first stmt by swapping the two operands of comparison; set SWAP[i]
> to 2 if stmt I is isormorphic to the first stmt by inverting the code
> of comparison. Take A1 >= B1 ? X1 : Y1 as an exmple, it can be swapped
> to (B1 <= A1 ? X1 : Y1); or be inverted to (A1 < B1) ? Y1 : X1. */
>
> but not for commutative operations.
>
> Does this mean that some form of canonicalization should have happened earlier
> to make the reduction detection more robust for them? Or does the function
> need to cope with them specifically, for example as in the attached patch that
> fixes in the instability for this case? Or else should the reduction have
> been detected earlier?
I think the patch is wrong, it ignores the difference. But then the
issue is that
in vect_build_slp_tree_2 when we try swapping operands of mismatched
stmts we do not handle swapping for the reduc-idx (because that wasn't possible
in the past). I _think_ it should be possible to relax this now, aka
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 950c6fb798e..00250251881 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2915,9 +2915,7 @@ out:
&& (nops == 2 || nops == 3)
&& oprnds_info[1]->first_dt == vect_internal_def
&& (is_gimple_assign (stmt_info->stmt)
- || is_gimple_call (stmt_info->stmt))
- /* Swapping operands for reductions breaks assumptions later on. */
- && STMT_VINFO_REDUC_IDX (stmt_info) == -1)
+ || is_gimple_call (stmt_info->stmt)))
{
/* See whether we can swap the matching or the non-matching
stmt operands. */
>
>
> * tree-vect-slp.cc (vect_build_slp_tree_1): Accept swapped operands
> for commutative operations in reduction statements.
>
>
> --
> Eric Botcazou