RE: [PATCH] Pass reduction var to vectorize_fold_left_reduction directly

Richard Biener Fri, 29 Aug 2025 01:11:47 -0700

On Fri, 29 Aug 2025, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Friday, August 29, 2025 9:15 AM
> > To: Andrew Pinski <andrew.pin...@oss.qualcomm.com>
> > Cc: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org; 
> > RISC-
> > V CI <patchworks...@rivosinc.com>
> > Subject: Re: [PATCH] Pass reduction var to vectorize_fold_left_reduction 
> > directly
> > 
> > On Thu, 28 Aug 2025, Andrew Pinski wrote:
> > 
> > > On Thu, Aug 28, 2025 at 12:17 AM Richard Biener <rguent...@suse.de> wrote:
> > > >
> > > > On Wed, 27 Aug 2025, Tamar Christina wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguent...@suse.de>
> > > > > > Sent: Wednesday, August 27, 2025 2:56 PM
> > > > > > To: gcc-patches@gcc.gnu.org
> > > > > > Cc: RISC-V CI <patchworks...@rivosinc.com>; Tamar Christina
> > > > > > <tamar.christ...@arm.com>
> > > > > > Subject: [PATCH] Pass reduction var to 
> > > > > > vectorize_fold_left_reduction directly
> > > > > >
> > > > > > Instead of going via the PHI node def, use the scalar reduction
> > > > > > input from the reduction stmt.
> > > > > >
> > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > > >
> > > > > > Sorry for the re-post, this is now re-based on pristine trunk,
> > > > > > hopefully making git am happy.
> > > > >
> > > > > AArch64 and Arm are clean, I did notice an ICE on x86_64 though:
> > > > >
> > > > > spawn -ignore SIGHUP /opt/buildAgent/temp/buildTmp/gcc/xgcc -
> > B/opt/buildAgent/temp/buildTmp/gcc/
> > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-cond-
> > reduc-in-order-2-signed-zero.c -m64 -fdiagnostics-plain-output -msse2 
> > -ftree-
> > vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model 
> > -fno-common -
> > O2 -fdump-tree-vect-details -std=gnu99 -fno-fast-math -lm -o 
> > ./vect-cond-reduc-
> > in-order-2-signed-zero.exe
> > > > > during GIMPLE pass: vect
> > > > > dump file: ./vect-cond-reduc-in-order-2-signed-zero.c.185t.vect
> > > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-
> > cond-reduc-in-order-2-signed-zero.c: In function 'reduc_plus_double':
> > > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-
> > cond-reduc-in-order-2-signed-zero.c:13:1: internal compiler error: in
> > compute_live_loop_exits, at tree-ssa-loop-manip.cc:262
> > > > > 0x2e864c7 internal_error(char const*, ...)
> > > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/diagnostic-global-
> > context.cc:534
> > > > > 0xf08d9c fancy_abort(char const*, int, char const*)
> > > > >
> > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/diagnostics/context.cc:1687
> > > > > 0x99b97d compute_live_loop_exits
> > > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:262
> > > > > 0x99b97d add_exit_phis_var
> > > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:344
> > > > > 0x99b97d add_exit_phis
> > > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:404
> > > > > 0x19092bf rewrite_into_loop_closed_ssa_1
> > > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:618
> > > > > 0x19092bf rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int)
> > > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:640
> > > > > 0x1ad7855 execute
> > > > >   /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vectorizer.cc:1399
> > > > >
> > > > > The compiler was configured with:
> > > > >
> > > > >   + /opt/buildAgent/work/505bfdd4dad8af3d/configure --enable-
> > checking=yes,rtl,extra --with-arch=native --enable-multilib --with-multilib-
> > list=m32,m64
> > > >
> > > > -march=native, OK, I can reproduce that.  I do wonder whether we want to
> > > > iterate over SSE, AVX2, AVX512 in vect.exp testing ... (and possibly
> > > > drop -flto there).
> > >
> > > For aarch64, we should also iterate with -march=armv8-a+sve too.
> > > Dropping LTO might be ok except for the cost model testcases (I can't
> > > remember if there is any with vect.exp).
> > 
> > I'll note that with LTO we scan the non-LTO fat-LTO dump files, so the
> > only thing we get is the execution validation.
> > 
> > Doing
> > 
> > if { [istarget i?86-*-*] || [istarget x86_64-*-*] } {
> >     lappend VECT_ADDITIONAL_FLAGS "-march=x86-64-v3"
> >     lappend VECT_ADDITIONAL_FLAGS "-march=x86-64-v4"
> > }
> > 
> > without removing the "-flto -ffat-lto-objects" run increases -j12
> > check-gcc vect.exp CPU time from 2631s to 5046s (that's with a -O0 and
> > checking enabled compiler).  Removing -flto recovers ~1500s.
> > 
> > The above also shows that check_vect() doesn't properly handle AVX512...
> > There's possibly a way to do the dg-do run -> compile drop for the
> > AVX512 case when !avx512_runtime, but IMO check_vect() should be "fixed"
> > here.
> > 
> > We also get some new scan FAILs, so some testsuite adjustments are needed.
> > 
> > Given the AVX512 situation I'd probably start with only adding
> > -march=x86-64-v3 for now.
> > 
> 
> Sounds reasonable, I haven't run vect in SVE only mode for a while, but last 
> time
> there was quite a few failures due to simply some of the code paths (and thus 
> scan
> dumps) don't apply to VLA. So similar to what you saw.
> 
> You'd also have to force SVE only with -mautovec-preference=sve-only since 
> just
> enabling SVE doesn't mean it'll use it.  Though the -fno-vect-cost-model 
> makes it
> likely it does.
> 
> You also have the problem that there's no way to effectuate a change in only 
> extensions
> but not change baseline options. So even if we run with `-march=armv8-a+sve` 
> the
> effective targets tests may force it back off, for instance 
> gcc/testsuite/gcc.dg/vect/vect-reduc-dot-10.c
> would force SVE off because we have no way of just enabling the extension.


I think that's OK though.  It would be nice to have such test UNSUPPORTED
rather than re-run in the same way twice, but well ...

So with just x86-64-v3 (aka AVX2) I see 26 unexpected FAILs and one
XPASS in the C part of vect.exp.

I'll re-iterate another missing feature, the ability to add sth like

/* { dg-extra-torture { { -march=x86-64-v3 } { -msse4.1 } } { target { 
x86-64-*-* i?86-*-* } } } */

to select tests.

Richard.

> Just some food for thought.
> 
> Tamar
> 
> > Richard.
> > 
> > > Thanks,
> > > Andrew
> > >
> > > >
> > > > Investigating.
> > > >
> > > > Richard.
> > > >
> > > > > Cheers,
> > > > > Tamar
> > > > >
> > > > > >
> > > > > >     * tree-vect-loop.cc (vectorize_fold_left_reduction): Get
> > > > > >     reduc_var as argument.
> > > > > >     (vect_transform_reduction): Adjust.
> > > > > > ---
> > > > > >  gcc/tree-vect-loop.cc | 8 +++-----
> > > > > >  1 file changed, 3 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > > index f22613e5a3a..f48a18a593d 100644
> > > > > > --- a/gcc/tree-vect-loop.cc
> > > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > > @@ -6501,7 +6501,7 @@ vectorize_fold_left_reduction (loop_vec_info
> > > > > > loop_vinfo,
> > > > > >                            stmt_vec_info stmt_info,
> > > > > >                            gimple_stmt_iterator *gsi,
> > > > > >                            slp_tree slp_node,
> > > > > > -                          gimple *reduc_def_stmt,
> > > > > > +                          tree reduc_var,
> > > > > >                            code_helper code, internal_fn reduc_fn,
> > > > > >                            int num_ops, tree vectype_in,
> > > > > >                            int reduc_index, vec_loop_masks *masks,
> > > > > > @@ -6546,7 +6546,6 @@ vectorize_fold_left_reduction (loop_vec_info
> > > > > > loop_vinfo,
> > > > > >    gimple *sdef = vect_orig_stmt (scalar_dest_def_info)->stmt;
> > > > > >    tree scalar_dest = gimple_get_lhs (sdef);
> > > > > >    tree scalar_type = TREE_TYPE (scalar_dest);
> > > > > > -  tree reduc_var = gimple_phi_result (reduc_def_stmt);
> > > > > >
> > > > > >    int vec_num = vec_oprnds0.length ();
> > > > > >    tree vec_elem_type = TREE_TYPE (vectype_out);
> > > > > > @@ -8016,8 +8015,6 @@ vect_transform_reduction (loop_vec_info
> > loop_vinfo,
> > > > > >       The last use is the reduction variable.  In case of nested 
> > > > > > cycle this
> > > > > >       assumption is not true: we use reduc_index to record the 
> > > > > > index of the
> > > > > >       reduction variable.  */
> > > > > > -  stmt_vec_info phi_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt
> > > > > > (stmt_info));
> > > > > > -  gphi *reduc_def_phi = as_a <gphi *> (phi_info->stmt);
> > > > > >    int reduc_index = STMT_VINFO_REDUC_IDX (stmt_info);
> > > > > >    tree vectype_in = SLP_TREE_VECTYPE (SLP_TREE_CHILDREN
> > (slp_node)[0]);
> > > > > >
> > > > > > @@ -8059,8 +8056,9 @@ vect_transform_reduction (loop_vec_info
> > loop_vinfo,
> > > > > >      {
> > > > > >        internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
> > > > > >        gcc_assert (code.is_tree_code () || cond_fn_p);
> > > > > > +      tree reduc_var = op.ops[reduc_index];
> > > > > >        return vectorize_fold_left_reduction
> > > > > > -     (loop_vinfo, stmt_info, gsi, slp_node, reduc_def_phi,
> > > > > > +     (loop_vinfo, stmt_info, gsi, slp_node, reduc_var,
> > > > > >        code, reduc_fn, op.num_ops, vectype_in,
> > > > > >        reduc_index, masks, lens);
> > > > > >      }
> > > > > > --
> > > > > > 2.43.0
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguent...@suse.de>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH] Pass reduction var to vectorize_fold_left_reduction directly

Reply via email to