RE: [PATCH] Pass reduction var to vectorize_fold_left_reduction directly

Tamar Christina Fri, 29 Aug 2025 00:34:45 -0700

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Friday, August 29, 2025 9:15 AM
> To: Andrew Pinski <andrew.pin...@oss.qualcomm.com>
> Cc: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; RISC-
> V CI <patchworks...@rivosinc.com>
> Subject: Re: [PATCH] Pass reduction var to vectorize_fold_left_reduction 
> directly
> 
> On Thu, 28 Aug 2025, Andrew Pinski wrote:
> 
> > On Thu, Aug 28, 2025 at 12:17 AM Richard Biener <rguent...@suse.de> wrote:
> > >
> > > On Wed, 27 Aug 2025, Tamar Christina wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguent...@suse.de>
> > > > > Sent: Wednesday, August 27, 2025 2:56 PM
> > > > > To: gcc-patches@gcc.gnu.org
> > > > > Cc: RISC-V CI <patchworks...@rivosinc.com>; Tamar Christina
> > > > > <tamar.christ...@arm.com>
> > > > > Subject: [PATCH] Pass reduction var to vectorize_fold_left_reduction 
> > > > > directly
> > > > >
> > > > > Instead of going via the PHI node def, use the scalar reduction
> > > > > input from the reduction stmt.
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > >
> > > > > Sorry for the re-post, this is now re-based on pristine trunk,
> > > > > hopefully making git am happy.
> > > >
> > > > AArch64 and Arm are clean, I did notice an ICE on x86_64 though:
> > > >
> > > > spawn -ignore SIGHUP /opt/buildAgent/temp/buildTmp/gcc/xgcc -
> B/opt/buildAgent/temp/buildTmp/gcc/
> /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-cond-
> reduc-in-order-2-signed-zero.c -m64 -fdiagnostics-plain-output -msse2 -ftree-
> vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common 
> -
> O2 -fdump-tree-vect-details -std=gnu99 -fno-fast-math -lm -o 
> ./vect-cond-reduc-
> in-order-2-signed-zero.exe
> > > > during GIMPLE pass: vect
> > > > dump file: ./vect-cond-reduc-in-order-2-signed-zero.c.185t.vect
> > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-
> cond-reduc-in-order-2-signed-zero.c: In function 'reduc_plus_double':
> > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-
> cond-reduc-in-order-2-signed-zero.c:13:1: internal compiler error: in
> compute_live_loop_exits, at tree-ssa-loop-manip.cc:262
> > > > 0x2e864c7 internal_error(char const*, ...)
> > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/diagnostic-global-
> context.cc:534
> > > > 0xf08d9c fancy_abort(char const*, int, char const*)
> > > >
> /opt/buildAgent/work/505bfdd4dad8af3d/gcc/diagnostics/context.cc:1687
> > > > 0x99b97d compute_live_loop_exits
> > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:262
> > > > 0x99b97d add_exit_phis_var
> > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:344
> > > > 0x99b97d add_exit_phis
> > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:404
> > > > 0x19092bf rewrite_into_loop_closed_ssa_1
> > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:618
> > > > 0x19092bf rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int)
> > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:640
> > > > 0x1ad7855 execute
> > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vectorizer.cc:1399
> > > >
> > > > The compiler was configured with:
> > > >
> > > >   + /opt/buildAgent/work/505bfdd4dad8af3d/configure --enable-
> checking=yes,rtl,extra --with-arch=native --enable-multilib --with-multilib-
> list=m32,m64
> > >
> > > -march=native, OK, I can reproduce that.  I do wonder whether we want to
> > > iterate over SSE, AVX2, AVX512 in vect.exp testing ... (and possibly
> > > drop -flto there).
> >
> > For aarch64, we should also iterate with -march=armv8-a+sve too.
> > Dropping LTO might be ok except for the cost model testcases (I can't
> > remember if there is any with vect.exp).
> 
> I'll note that with LTO we scan the non-LTO fat-LTO dump files, so the
> only thing we get is the execution validation.
> 
> Doing
> 
> if { [istarget i?86-*-*] || [istarget x86_64-*-*] } {
>     lappend VECT_ADDITIONAL_FLAGS "-march=x86-64-v3"
>     lappend VECT_ADDITIONAL_FLAGS "-march=x86-64-v4"
> }
> 
> without removing the "-flto -ffat-lto-objects" run increases -j12
> check-gcc vect.exp CPU time from 2631s to 5046s (that's with a -O0 and
> checking enabled compiler).  Removing -flto recovers ~1500s.
> 
> The above also shows that check_vect() doesn't properly handle AVX512...
> There's possibly a way to do the dg-do run -> compile drop for the
> AVX512 case when !avx512_runtime, but IMO check_vect() should be "fixed"
> here.
> 
> We also get some new scan FAILs, so some testsuite adjustments are needed.
> 
> Given the AVX512 situation I'd probably start with only adding
> -march=x86-64-v3 for now.
>


Sounds reasonable, I haven't run vect in SVE only mode for a while, but last 
time
there was quite a few failures due to simply some of the code paths (and thus 
scan
dumps) don't apply to VLA. So similar to what you saw.

You'd also have to force SVE only with -mautovec-preference=sve-only since just
enabling SVE doesn't mean it'll use it.  Though the -fno-vect-cost-model makes 
it
likely it does.

You also have the problem that there's no way to effectuate a change in only 
extensions
but not change baseline options. So even if we run with `-march=armv8-a+sve` the
effective targets tests may force it back off, for instance 
gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
would force SVE off because we have no way of just enabling the extension.

Just some food for thought.

Tamar

> Richard.
> 
> > Thanks,
> > Andrew
> >
> > >
> > > Investigating.
> > >
> > > Richard.
> > >
> > > > Cheers,
> > > > Tamar
> > > >
> > > > >
> > > > >     * tree-vect-loop.cc (vectorize_fold_left_reduction): Get
> > > > >     reduc_var as argument.
> > > > >     (vect_transform_reduction): Adjust.
> > > > > ---
> > > > >  gcc/tree-vect-loop.cc | 8 +++-----
> > > > >  1 file changed, 3 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > index f22613e5a3a..f48a18a593d 100644
> > > > > --- a/gcc/tree-vect-loop.cc
> > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > @@ -6501,7 +6501,7 @@ vectorize_fold_left_reduction (loop_vec_info
> > > > > loop_vinfo,
> > > > >                            stmt_vec_info stmt_info,
> > > > >                            gimple_stmt_iterator *gsi,
> > > > >                            slp_tree slp_node,
> > > > > -                          gimple *reduc_def_stmt,
> > > > > +                          tree reduc_var,
> > > > >                            code_helper code, internal_fn reduc_fn,
> > > > >                            int num_ops, tree vectype_in,
> > > > >                            int reduc_index, vec_loop_masks *masks,
> > > > > @@ -6546,7 +6546,6 @@ vectorize_fold_left_reduction (loop_vec_info
> > > > > loop_vinfo,
> > > > >    gimple *sdef = vect_orig_stmt (scalar_dest_def_info)->stmt;
> > > > >    tree scalar_dest = gimple_get_lhs (sdef);
> > > > >    tree scalar_type = TREE_TYPE (scalar_dest);
> > > > > -  tree reduc_var = gimple_phi_result (reduc_def_stmt);
> > > > >
> > > > >    int vec_num = vec_oprnds0.length ();
> > > > >    tree vec_elem_type = TREE_TYPE (vectype_out);
> > > > > @@ -8016,8 +8015,6 @@ vect_transform_reduction (loop_vec_info
> loop_vinfo,
> > > > >       The last use is the reduction variable.  In case of nested 
> > > > > cycle this
> > > > >       assumption is not true: we use reduc_index to record the index 
> > > > > of the
> > > > >       reduction variable.  */
> > > > > -  stmt_vec_info phi_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt
> > > > > (stmt_info));
> > > > > -  gphi *reduc_def_phi = as_a <gphi *> (phi_info->stmt);
> > > > >    int reduc_index = STMT_VINFO_REDUC_IDX (stmt_info);
> > > > >    tree vectype_in = SLP_TREE_VECTYPE (SLP_TREE_CHILDREN
> (slp_node)[0]);
> > > > >
> > > > > @@ -8059,8 +8056,9 @@ vect_transform_reduction (loop_vec_info
> loop_vinfo,
> > > > >      {
> > > > >        internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
> > > > >        gcc_assert (code.is_tree_code () || cond_fn_p);
> > > > > +      tree reduc_var = op.ops[reduc_index];
> > > > >        return vectorize_fold_left_reduction
> > > > > -     (loop_vinfo, stmt_info, gsi, slp_node, reduc_def_phi,
> > > > > +     (loop_vinfo, stmt_info, gsi, slp_node, reduc_var,
> > > > >        code, reduc_fn, op.num_ops, vectype_in,
> > > > >        reduc_index, masks, lens);
> > > > >      }
> > > > > --
> > > > > 2.43.0
> > > >
> > >
> > > --
> > > Richard Biener <rguent...@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH] Pass reduction var to vectorize_fold_left_reduction directly

Reply via email to