On Thu, 28 Aug 2025, Andrew Pinski wrote: > On Thu, Aug 28, 2025 at 12:17 AM Richard Biener <rguent...@suse.de> wrote: > > > > On Wed, 27 Aug 2025, Tamar Christina wrote: > > > > > > -----Original Message----- > > > > From: Richard Biener <rguent...@suse.de> > > > > Sent: Wednesday, August 27, 2025 2:56 PM > > > > To: gcc-patches@gcc.gnu.org > > > > Cc: RISC-V CI <patchworks...@rivosinc.com>; Tamar Christina > > > > <tamar.christ...@arm.com> > > > > Subject: [PATCH] Pass reduction var to vectorize_fold_left_reduction > > > > directly > > > > > > > > Instead of going via the PHI node def, use the scalar reduction > > > > input from the reduction stmt. > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > > > Sorry for the re-post, this is now re-based on pristine trunk, > > > > hopefully making git am happy. > > > > > > AArch64 and Arm are clean, I did notice an ICE on x86_64 though: > > > > > > spawn -ignore SIGHUP /opt/buildAgent/temp/buildTmp/gcc/xgcc > > > -B/opt/buildAgent/temp/buildTmp/gcc/ > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c > > > -m64 -fdiagnostics-plain-output -msse2 -ftree-vectorize > > > -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 > > > -fdump-tree-vect-details -std=gnu99 -fno-fast-math -lm -o > > > ./vect-cond-reduc-in-order-2-signed-zero.exe > > > during GIMPLE pass: vect > > > dump file: ./vect-cond-reduc-in-order-2-signed-zero.c.185t.vect > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: > > > In function 'reduc_plus_double': > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c:13:1: > > > internal compiler error: in compute_live_loop_exits, at > > > tree-ssa-loop-manip.cc:262 > > > 0x2e864c7 internal_error(char const*, ...) > > > > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/diagnostic-global-context.cc:534 > > > 0xf08d9c fancy_abort(char const*, int, char const*) > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/diagnostics/context.cc:1687 > > > 0x99b97d compute_live_loop_exits > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:262 > > > 0x99b97d add_exit_phis_var > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:344 > > > 0x99b97d add_exit_phis > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:404 > > > 0x19092bf rewrite_into_loop_closed_ssa_1 > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:618 > > > 0x19092bf rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int) > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-ssa-loop-manip.cc:640 > > > 0x1ad7855 execute > > > /opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vectorizer.cc:1399 > > > > > > The compiler was configured with: > > > > > > + /opt/buildAgent/work/505bfdd4dad8af3d/configure > > > --enable-checking=yes,rtl,extra --with-arch=native --enable-multilib > > > --with-multilib-list=m32,m64 > > > > -march=native, OK, I can reproduce that. I do wonder whether we want to > > iterate over SSE, AVX2, AVX512 in vect.exp testing ... (and possibly > > drop -flto there). > > For aarch64, we should also iterate with -march=armv8-a+sve too. > Dropping LTO might be ok except for the cost model testcases (I can't > remember if there is any with vect.exp).
I'll note that with LTO we scan the non-LTO fat-LTO dump files, so the only thing we get is the execution validation. Doing if { [istarget i?86-*-*] || [istarget x86_64-*-*] } { lappend VECT_ADDITIONAL_FLAGS "-march=x86-64-v3" lappend VECT_ADDITIONAL_FLAGS "-march=x86-64-v4" } without removing the "-flto -ffat-lto-objects" run increases -j12 check-gcc vect.exp CPU time from 2631s to 5046s (that's with a -O0 and checking enabled compiler). Removing -flto recovers ~1500s. The above also shows that check_vect() doesn't properly handle AVX512... There's possibly a way to do the dg-do run -> compile drop for the AVX512 case when !avx512_runtime, but IMO check_vect() should be "fixed" here. We also get some new scan FAILs, so some testsuite adjustments are needed. Given the AVX512 situation I'd probably start with only adding -march=x86-64-v3 for now. Richard. > Thanks, > Andrew > > > > > Investigating. > > > > Richard. > > > > > Cheers, > > > Tamar > > > > > > > > > > > * tree-vect-loop.cc (vectorize_fold_left_reduction): Get > > > > reduc_var as argument. > > > > (vect_transform_reduction): Adjust. > > > > --- > > > > gcc/tree-vect-loop.cc | 8 +++----- > > > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > > > index f22613e5a3a..f48a18a593d 100644 > > > > --- a/gcc/tree-vect-loop.cc > > > > +++ b/gcc/tree-vect-loop.cc > > > > @@ -6501,7 +6501,7 @@ vectorize_fold_left_reduction (loop_vec_info > > > > loop_vinfo, > > > > stmt_vec_info stmt_info, > > > > gimple_stmt_iterator *gsi, > > > > slp_tree slp_node, > > > > - gimple *reduc_def_stmt, > > > > + tree reduc_var, > > > > code_helper code, internal_fn reduc_fn, > > > > int num_ops, tree vectype_in, > > > > int reduc_index, vec_loop_masks *masks, > > > > @@ -6546,7 +6546,6 @@ vectorize_fold_left_reduction (loop_vec_info > > > > loop_vinfo, > > > > gimple *sdef = vect_orig_stmt (scalar_dest_def_info)->stmt; > > > > tree scalar_dest = gimple_get_lhs (sdef); > > > > tree scalar_type = TREE_TYPE (scalar_dest); > > > > - tree reduc_var = gimple_phi_result (reduc_def_stmt); > > > > > > > > int vec_num = vec_oprnds0.length (); > > > > tree vec_elem_type = TREE_TYPE (vectype_out); > > > > @@ -8016,8 +8015,6 @@ vect_transform_reduction (loop_vec_info > > > > loop_vinfo, > > > > The last use is the reduction variable. In case of nested cycle > > > > this > > > > assumption is not true: we use reduc_index to record the index of > > > > the > > > > reduction variable. */ > > > > - stmt_vec_info phi_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt > > > > (stmt_info)); > > > > - gphi *reduc_def_phi = as_a <gphi *> (phi_info->stmt); > > > > int reduc_index = STMT_VINFO_REDUC_IDX (stmt_info); > > > > tree vectype_in = SLP_TREE_VECTYPE (SLP_TREE_CHILDREN (slp_node)[0]); > > > > > > > > @@ -8059,8 +8056,9 @@ vect_transform_reduction (loop_vec_info > > > > loop_vinfo, > > > > { > > > > internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info); > > > > gcc_assert (code.is_tree_code () || cond_fn_p); > > > > + tree reduc_var = op.ops[reduc_index]; > > > > return vectorize_fold_left_reduction > > > > - (loop_vinfo, stmt_info, gsi, slp_node, reduc_def_phi, > > > > + (loop_vinfo, stmt_info, gsi, slp_node, reduc_var, > > > > code, reduc_fn, op.num_ops, vectype_in, > > > > reduc_index, masks, lens); > > > > } > > > > -- > > > > 2.43.0 > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)