On Wed, 4 Feb 2026, Alfie Richards wrote:
> The 02/04/2026 13:22, Richard Biener wrote:
> > On Wed, 4 Feb 2026, Alfie Richards wrote:
> >
> > > This is a proof of concept. This passes bootstrap and regtest but I don't
> > > think this is a sensible code choice.
> > >
> > > This adds a rematerialize loads pass that does the following
> > > transformation:
> > >
> > > FROM:
> > > <bb 2>:
> > > _9 = *arr_4(D);
> > >
> > > <bb 3>:
> > > # _10 = PHI <_1(7), _9(2)>
> > > # arr_11 = PHI <arr_7(7), arr_4(D)(2)>
> > >
> > > # Some uses of _10
> > >
> > > <bb 4>:
> > > arr_7 = arr_11 + 1;
> > > _1 = *arr_7;
> > >
> > > if (_1 != 0)
> > > goto <bb 3>; [94.50%]
> > > else
> > > # Exit the loop
> > >
> > > TO:
> > >
> > > <bb 2>:
> > >
> > > <bb 3>:
> > > # arr_11 = PHI <arr_7(7), arr_4(D)(2)>
> > > # _10 = *arr_11
> > >
> > > <bb 4>:
> > > arr_7 = arr_11 + 1;
> > > _1 = *arr_7;
> > >
> > > if (_1 != 0)
> > > goto <bb 3>; [94.50%]
> > > else
> > > # Exit the loop
> > >
> > > This allows loops such as the below to be vectorized.
> > >
> > > int foo(char *arr, char c) {
> > > while (*arr) {
> > > if (*arr == c) return 1;
> > > arr++;
> > > }
> > > return 0;
> > > }
> > >
> > > I'm not sure where this transformation fits. I could imagine it going
> > > in ifcvt, though it feels conceptually disjoint. Maybe in vect, but
> > > before analysis?
> > >
> > > I see roughly 0.3% geomean speedup at O3 with this accross various
> > > benchmarks on aarch64 and on x86 with avx512.
> > >
> > > Any feedback would be appreciated.
> >
> > I'd not make an extra pass but put it in tree-if-conv.cc. In theory
> > we could also mangle the SLP tree after construction, thus have this
> > as an extra "SLP PHI elimination phase", it might or might not fit well
> > into SLP pattern processing (maybe it alters the SLP graph structure
> > too much).
>
> Yup, I'm happy moving this to tree-if-conv.cc.
>
> What would be the benefit of the SLP phi elimination phase?
> Would that be instead of doing this in tree-if-conv.cc? If so we would still
> have to deal with vect_analyze_scalar_cycles rejecting it.
Yes, instead of tree-if-conv.cc. vect_analyze_scalar_cycles would have
to "accept it", in the sense of keeping it unknown, thus requiring
later elimination.
tree-if-conv.cc is more straight-forward, doing it in the vectorizer
only would ease getting rid of tree-if-conv.cc in the future.
> > You are missing correctness checks - consider an aliasing
> > store after _1 = *arr_7 or after the load of _9 = *arr_4(D).
>
> Ahh yes thank you. Will add that for a full patch in/close to stage 1.
>
> Thanks,
> Alfie
> >
> > Richard.
> >
> > > Thanks,
> > > Alfie
> > >
> > > gcc/ChangeLog:
> > >
> > > * Makefile.in: Add new file
> > > * passes.def: Add rematerialize_loads pass.
> > > * timevar.def (TV_TREE_REMATERIALIZE_LOADS): Add rematerialize loads
> > > pass.
> > > * tree-pass.h (make_pass_rematerialize_loads): Add rematerialize loads
> > > pass.
> > > * rematerialize_loads.cc: New pass.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/vect/vect_rematerialize_loads.c: New test.
> > > ---
> > > gcc/Makefile.in | 1 +
> > > gcc/passes.def | 1 +
> > > gcc/rematerialize_loads.cc | 321 ++++++++++++++++++
> > > .../vect/vect-early-break-rematerialize.c | 16 +
> > > gcc/timevar.def | 1 +
> > > gcc/tree-pass.h | 1 +
> > > 6 files changed, 341 insertions(+)
> > > create mode 100644 gcc/rematerialize_loads.cc
> > > create mode 100644
> > > gcc/testsuite/gcc.dg/vect/vect-early-break-rematerialize.c
> > >
> > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > index abf98aabac8..00ae012850b 100644
> > > --- a/gcc/Makefile.in
> > > +++ b/gcc/Makefile.in
> > > @@ -1757,6 +1757,7 @@ OBJS = \
> > > tree-eh.o \
> > > tree-emutls.o \
> > > tree-if-conv.o \
> > > + rematerialize_loads.o \
> > > tree-inline.o \
> > > tree-into-ssa.o \
> > > tree-iterator.o \
> > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > index cdddb87302f..0b9ffc3f1ad 100644
> > > --- a/gcc/passes.def
> > > +++ b/gcc/passes.def
> > > @@ -317,6 +317,7 @@ along with GCC; see the file COPYING3. If not see
> > > NEXT_PASS (pass_expand_omp_ssa);
> > > NEXT_PASS (pass_ch_vect);
> > > NEXT_PASS (pass_if_conversion);
> > > + NEXT_PASS (pass_rematerialize_loads);
> > > /* pass_vectorize must immediately follow pass_if_conversion.
> > > Please do not add any other passes in between. */
> > > NEXT_PASS (pass_vectorize);
> > > diff --git a/gcc/rematerialize_loads.cc b/gcc/rematerialize_loads.cc
> > > new file mode 100644
> > > index 00000000000..d55a6a70a74
> > > --- /dev/null
> > > +++ b/gcc/rematerialize_loads.cc
> > > @@ -0,0 +1,321 @@
> > > +/* Load rematerialization.
> > > + Copyright (C) 2026 Free Software Foundation, Inc.
> > > +
> > > + This file is part of GCC.
> > > +
> > > + GCC is free software; you can redistribute it and/or modify it
> > > + under the terms of the GNU General Public License as published by
> > > + the Free Software Foundation; either version 3, or (at your option)
> > > + any later version.
> > > +
> > > + GCC is distributed in the hope that it will be useful, but WITHOUT
> > > + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> > > + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
> > > + License for more details.
> > > +
> > > + You should have received a copy of the GNU General Public License
> > > + along with GCC; see the file COPYING3. If not see
> > > + <http://www.gnu.org/licenses/>. */
> > > +
> > > +#include "config.h"
> > > +#include "system.h"
> > > +#include "coretypes.h"
> > > +#include "backend.h"
> > > +#include "rtl.h"
> > > +#include "tree.h"
> > > +#include "gimple.h"
> > > +#include "cfghooks.h"
> > > +#include "tree-pass.h"
> > > +#include "ssa.h"
> > > +#include "expr.h"
> > > +#include "fold-const.h"
> > > +#include "gimple-iterator.h"
> > > +#include "tree-into-ssa.h"
> > > +#include "tree-ssa.h"
> > > +#include "cfgloop.h"
> > > +#include "tree-vectorizer.h"
> > > +
> > > +/* We sometimes end up with a loop like:
> > > +
> > > +<bb 2>:
> > > + _9 = *arr_4(D);
> > > +
> > > +<bb 3>:
> > > + # _10 = PHI <_1(7), _9(2)>
> > > + # arr_11 = PHI <arr_7(7), arr_4(D)(2)>
> > > +
> > > + # Some uses of _10
> > > +
> > > +<bb 4>:
> > > + arr_7 = arr_11 + 1;
> > > + _1 = *arr_7;
> > > +
> > > + if (_1 != 0)
> > > + goto <bb 3>; [94.50%]
> > > + else
> > > + # Exit the loop
> > > +
> > > +With one use-def loop for the array pointer, and one for the defefrenced
> > > value
> > > +This gets in the way of vectorization. So we transform it to:
> > > +
> > > +<bb 2>:
> > > +
> > > +<bb 3>:
> > > + # arr_11 = PHI <arr_7(7), arr_4(D)(2)>
> > > + # _10 = *arr_11
> > > +
> > > +<bb 4>:
> > > + arr_7 = arr_11 + 1;
> > > + _1 = *arr_7;
> > > +
> > > + if (_1 != 0)
> > > + goto <bb 3>; [94.50%]
> > > + else
> > > + # Exit the loop
> > > + */
> > > +
> > > +unsigned
> > > +vect_try_rematerialize_loads (class loop *loop)
> > > +{
> > > + gphi_iterator gsi;
> > > + basic_block bb = loop->header;
> > > +
> > > + DUMP_VECT_SCOPE ("vect_try_rematerialize_loads");
> > > +
> > > + unsigned retval = 0;
> > > + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi);)
> > > + {
> > > + gphi *phi = gsi.phi ();
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location, "Looking at phi: %G",
> > > + (gimple *) phi);
> > > +
> > > + tree res = gimple_phi_result (phi);
> > > + if (TREE_CODE (res) != SSA_NAME)
> > > + continue;
> > > +
> > > + /* Only handle the cases where the phi has two arguments. */
> > > + if (gimple_phi_num_args (phi) != 2)
> > > + {
> > > + gsi_next (&gsi);
> > > + continue;
> > > + }
> > > +
> > > + tree left_arg = gimple_phi_arg_def (phi, 0);
> > > + tree right_arg = gimple_phi_arg_def (phi, 1);
> > > +
> > > + /* Only handle the cases where the two phi arguments are SSA
> > > names. */
> > > + if (TREE_CODE (left_arg) != SSA_NAME || TREE_CODE (right_arg) !=
> > > SSA_NAME)
> > > + {
> > > + gsi_next (&gsi);
> > > + continue;
> > > + }
> > > + gimple *left_arg_def = SSA_NAME_DEF_STMT (left_arg);
> > > + gimple *right_arg_def = SSA_NAME_DEF_STMT (right_arg);
> > > +
> > > + edge left_edge = gimple_phi_arg_edge (phi, 0);
> > > + edge right_edge = gimple_phi_arg_edge (phi, 1);
> > > +
> > > + /* Only handle the cases where the two phi arg SSA defs are loads
> > > (and not
> > > + also stores). */
> > > + if (!gimple_assign_load_p (left_arg_def)
> > > + || !gimple_assign_load_p (right_arg_def)
> > > + || gimple_store_p (left_arg_def) || gimple_store_p (right_arg_def))
> > > + {
> > > + gsi_next (&gsi);
> > > + continue;
> > > + }
> > > +
> > > + tree left_ref = gimple_assign_rhs1 (left_arg_def);
> > > + tree ref_type = TREE_TYPE (left_ref);
> > > + tree right_ref = gimple_assign_rhs1 (right_arg_def);
> > > + /* Only handle the cases where the two loads use MEM_REFS. */
> > > + if (TREE_CODE (left_ref) != MEM_REF || TREE_CODE (right_ref) !=
> > > MEM_REF)
> > > + {
> > > + gsi_next (&gsi);
> > > + continue;
> > > + }
> > > +
> > > + tree left_addr = TREE_OPERAND (left_ref, 0);
> > > + left_addr = tree_strip_nop_conversions (left_addr);
> > > + tree right_addr = TREE_OPERAND (right_ref, 0);
> > > + right_addr = tree_strip_nop_conversions (right_addr);
> > > + tree left_type = TREE_TYPE (left_addr);
> > > + tree right_type = TREE_TYPE (right_addr);
> > > + /* Only handle the cases where the two load types are compatible.
> > > */
> > > + if (!types_compatible_p (left_type, right_type))
> > > + {
> > > + gsi_next (&gsi);
> > > + continue;
> > > + }
> > > +
> > > + tree right_offset = TREE_OPERAND (right_ref, 1);
> > > + tree left_offset = TREE_OPERAND (right_ref, 1);
> > > +
> > > + /* Check the two loads use the same offsets. */
> > > + if (!operand_equal_p (left_offset, right_offset))
> > > + {
> > > + gsi_next (&gsi);
> > > + continue;
> > > + }
> > > + tree offset = left_offset;
> > > +
> > > + left_addr = tree_strip_nop_conversions (left_addr);
> > > + right_addr = tree_strip_nop_conversions (right_addr);
> > > +
> > > + if (dump_enabled_p ())
> > > + {
> > > + dump_printf_loc (MSG_NOTE, vect_location, "Left assign: %G",
> > > + (gimple *) left_arg_def);
> > > + dump_printf_loc (MSG_NOTE, vect_location, "Right assign: %G",
> > > + (gimple *) right_arg_def);
> > > + dump_printf_loc (MSG_NOTE, vect_location, "left addr: %T\n",
> > > + left_addr);
> > > + dump_printf_loc (MSG_NOTE, vect_location, "right addr: %T\n",
> > > + right_addr);
> > > + }
> > > +
> > > + /* Try find a phi node that is the address of the load phis. */
> > > + gphi_iterator addr_gsi;
> > > + gphi *address_phi = NULL;
> > > + for (addr_gsi = gsi_start_phis (bb); !gsi_end_p (addr_gsi);
> > > + gsi_next (&addr_gsi))
> > > + {
> > > + gphi *addr_phi = addr_gsi.phi ();
> > > +
> > > + /* Check we only have two phi nodes. */
> > > + if (addr_phi == phi || gimple_phi_num_args (addr_phi) != 2)
> > > + continue;
> > > + tree left_addr_arg = gimple_phi_arg_def (addr_phi, 0);
> > > + tree right_addr_arg = gimple_phi_arg_def (addr_phi, 1);
> > > +
> > > + left_addr_arg = tree_strip_nop_conversions (left_addr_arg);
> > > + right_addr_arg = tree_strip_nop_conversions (right_addr_arg);
> > > +
> > > + edge left_addr_edge = gimple_phi_arg_edge (addr_phi, 0);
> > > + edge right_addr_edge = gimple_phi_arg_edge (addr_phi, 1);
> > > +
> > > + /* Check this phi nodes edges match the original phi's nodes. */
> > > + if (left_addr_edge != left_edge || right_addr_edge != right_edge)
> > > + continue;
> > > +
> > > + /* Check neither the address nodes are loads. */
> > > + if (!gimple_assign_load_p (left_arg_def)
> > > + || !gimple_assign_load_p (right_arg_def))
> > > + continue;
> > > +
> > > + if (dump_enabled_p ())
> > > + {
> > > + dump_printf_loc (MSG_NOTE, vect_location, "left arg: %T\n",
> > > + left_addr_arg);
> > > + dump_printf_loc (MSG_NOTE, vect_location, "right arg: %T\n",
> > > + right_addr_arg);
> > > + }
> > > +
> > > + /* Check the addresses match between the load phi and the address
> > > + phi. */
> > > + if (!operand_equal_p (left_addr_arg, left_addr)
> > > + || !operand_equal_p (right_addr_arg, right_addr))
> > > + continue;
> > > +
> > > + tree left_arg_type = TREE_TYPE (left_addr_arg);
> > > + tree right_arg_type = TREE_TYPE (right_addr_arg);
> > > +
> > > + /* Check all the types are compatible */
> > > + if (!types_compatible_p (left_type, left_arg_type)
> > > + || !types_compatible_p (right_type, right_arg_type))
> > > + continue;
> > > +
> > > + address_phi = addr_phi;
> > > +
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location, "Matched phi: %G",
> > > + (gimple *) addr_phi);
> > > + }
> > > +
> > > + /* No address was found. */
> > > + if (!address_phi)
> > > + {
> > > + gsi_next (&gsi);
> > > + continue;
> > > + }
> > > +
> > > + // Yay! Change the phi to instead be a load of the other phi.
> > > + if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location, "Removing phi: %G",
> > > + (gimple *) phi);
> > > +
> > > + tree addr = gimple_phi_result (address_phi);
> > > + tree mem = build2 (MEM_REF, ref_type, addr, offset);
> > > + gassign *load = gimple_build_assign (res, mem);
> > > + remove_phi_node (&gsi, false);
> > > + gimple_stmt_iterator gsi_add = gsi_after_labels (bb);
> > > + gsi_insert_before (&gsi_add, load, GSI_SAME_STMT);
> > > + retval |= TODO_update_ssa;
> > > + }
> > > + return retval;
> > > +}
> > > +
> > > +/* Tree load-rematerialization pass management. */
> > > +
> > > +namespace {
> > > +
> > > +const pass_data pass_data_rematerialize_loads = {
> > > + GIMPLE_PASS, /* type */
> > > + "rematerialize_loads", /* name */
> > > + OPTGROUP_NONE, /* optinfo_flags */
> > > + TV_TREE_REMATERIALIZE_LOADS, /* tv_id */
> > > + (PROP_cfg | PROP_ssa), /* properties_required */
> > > + 0, /* properties_provided */
> > > + 0, /* properties_destroyed */
> > > + 0, /* todo_flags_start */
> > > + 0 /* todo_flags_finish */
> > > +};
> > > +
> > > +class pass_rematerialize_loads : public gimple_opt_pass
> > > +{
> > > +public:
> > > + pass_rematerialize_loads (gcc::context *ctxt)
> > > + : gimple_opt_pass (pass_data_rematerialize_loads, ctxt)
> > > + {}
> > > +
> > > + /* opt_pass methods: */
> > > + bool gate (function *) final override;
> > > + unsigned int execute (function *) final override;
> > > +
> > > +}; // class pass_rematerialize_loads
> > > +
> > > +bool
> > > +pass_rematerialize_loads::gate (function *fun)
> > > +{
> > > + return (flag_tree_loop_vectorize || fun->has_force_vectorize_loops);
> > > +}
> > > +
> > > +unsigned int
> > > +pass_rematerialize_loads::execute (function *fun)
> > > +{
> > > + unsigned todo = 0;
> > > +
> > > + if (number_of_loops (fun) <= 1)
> > > + return 0;
> > > +
> > > + auto_vec<gimple *> preds;
> > > + for (auto loop : loops_list (cfun, 0))
> > > + if (flag_tree_loop_if_convert == 1
> > > + || ((flag_tree_loop_vectorize || loop->force_vectorize)
> > > + && !loop->dont_vectorize))
> > > + todo |= vect_try_rematerialize_loads (loop);
> > > +
> > > + if (todo & TODO_update_ssa_any)
> > > + update_ssa (todo & TODO_update_ssa_any);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +} // namespace
> > > +
> > > +gimple_opt_pass *
> > > +make_pass_rematerialize_loads (gcc::context *ctxt)
> > > +{
> > > + return new pass_rematerialize_loads (ctxt);
> > > +}
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-rematerialize.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break-rematerialize.c
> > > new file mode 100644
> > > index 00000000000..17b9fa13fa4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-rematerialize.c
> > > @@ -0,0 +1,16 @@
> > > +/* { dg-add-options vect_early_break } */
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target vect_early_break } */
> > > +/* { dg-require-effective-target vect_int } */
> > > +
> > > +/* { dg-additional-options "-Ofast" } */
> > > +
> > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > +
> > > +int foo(char *arr, char c) {
> > > + while (*arr) {
> > > + if (*arr == c) return 1;
> > > + arr++;
> > > + }
> > > + return 0;
> > > +}
> > > diff --git a/gcc/timevar.def b/gcc/timevar.def
> > > index 3824caa01bc..a756fdfc05a 100644
> > > --- a/gcc/timevar.def
> > > +++ b/gcc/timevar.def
> > > @@ -318,6 +318,7 @@ DEFTIMEVAR (TV_TREE_UBSAN , "tree ubsan")
> > > DEFTIMEVAR (TV_INITIALIZE_RTL , "initialize rtl")
> > > DEFTIMEVAR (TV_GIMPLE_LADDRESS , "address lowering")
> > > DEFTIMEVAR (TV_TREE_LOOP_IFCVT , "tree loop if-conversion")
> > > +DEFTIMEVAR (TV_TREE_REMATERIALIZE_LOADS , "tree rematerialize
> > > loads")
> > > DEFTIMEVAR (TV_WARN_ACCESS , "access analysis")
> > > DEFTIMEVAR (TV_GIMPLE_CRC_OPTIMIZATION, "crc optimization")
> > > DEFTIMEVAR (TV_EXT_DCE , "ext dce")
> > > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > > index b3c97658a8f..b8a96b3cc30 100644
> > > --- a/gcc/tree-pass.h
> > > +++ b/gcc/tree-pass.h
> > > @@ -387,6 +387,7 @@ extern gimple_opt_pass *make_pass_empty_loop
> > > (gcc::context *ctxt);
> > > extern gimple_opt_pass *make_pass_graphite (gcc::context *ctxt);
> > > extern gimple_opt_pass *make_pass_graphite_transforms (gcc::context
> > > *ctxt);
> > > extern gimple_opt_pass *make_pass_if_conversion (gcc::context *ctxt);
> > > +extern gimple_opt_pass *make_pass_rematerialize_loads (gcc::context
> > > *ctxt);
> > > extern gimple_opt_pass *make_pass_if_to_switch (gcc::context *ctxt);
> > > extern gimple_opt_pass *make_pass_loop_distribution (gcc::context *ctxt);
> > > extern gimple_opt_pass *make_pass_crc_optimization (gcc::context *ctxt);
> > >
> >
> > --
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)