On Fri, May 15, 2026 at 11:04 PM Daniel Henrique Barboza
<[email protected]> wrote:
>
>
>
> On 5/15/2026 2:46 PM, Daniel Henrique Barboza wrote:
> >
> >
> > On 5/9/2026 6:00 PM, Jeffrey Law wrote:
> >>
> >>
> >>
> >> On 4/21/2026 7:30 AM, Daniel Henrique Barboza wrote:
> >>> From: Daniel Barboza <[email protected]>
> >>>
> >>> Consider the following code that checks if a given bit is set, setting
> >>> it in case it isn't:
> >>>
> >>> bit_val = 1 << num;
> >>> if ((ptr[x] & bit_val) == 0)
> >>>     {
> >>>       ptr[x] |= bit_val;
> >>>     }
> >>> return ptr[x];
> >>>
> >>> The generated gimple is something similar to:
> >>>
> >>>     ;; basic block 2
> >>>     bitshift_6 = 1 << bit_5(D);
> >>>     # VUSE <.MEM_7(D)>
> >>>     _1 = arrD.4593[n_8(D)];
> >>>     _2 = _1 & bitshift_6;
> >>>     if (_2 == 0) goto <bb 3>; else goto <bb 4>;
> >>>
> >>>     ;; basic block 3
> >>>     _3 = _1 | bitshift_6;
> >>>     # .MEM_9 = VDEF <.MEM_7(D)>
> >>>     arrD.4593[n_8(D)] = _3;
> >>>     ;;    succ:       4 [always]  (FALLTHRU,EXECUTABLE)
> >>>
> >>>     ;;   basic block 4,
> >>>     ;;    prev block 3
> >>>     # .MEM_4 = PHI <.MEM_7(D)(2), .MEM_9(3)>
> >>>     # VUSE <.MEM_4>
> >>>     _10 = arrD.4593[n_8(D)];
> >>>     # .MEM_11 = VDEF <.MEM_4>
> >>>     arrD.4593 ={v} {CLOBBER(eos)};
> >>>     # VUSE <.MEM_11>
> >>>     return _10;
> >>>
> >>> If we have the right conditions (e.g. we don't have store data races to
> >>> worry about, we're not dealing with read-only memory) we can move the
> >>> bitset operation to the cond_bb (block 2 in the example), removing the
> >>> potential branch mispredict, as long as we're able to identify this "bit
> >>> N is either already set or will end up being set" scenario:
> >>>
> >>>     bitshift_6 = 1 << bit_5(D);
> >>>     # VUSE <.MEM_7(D)>
> >>>     _1 = arrD.4593[n_8(D)];
> >>>     _2 = _1 & bitshift_6;
> >>>     _3 = _1 | bitshift_6;
> >>>     # .MEM_9 = VDEF <.MEM_7(D)>
> >>>     arrD.4593[n_8(D)] = _3;
> >>>     if (_2 == 0) goto <bb 3>; else goto <bb 4>;
> >>>
> >>>     ;; basic block 3
> >>>     ;;    succ:       4 [always]  (FALLTHRU,EXECUTABLE)
> >>>     (...)
> >>>
> >>> If the bitcheck result isn't being used as a PHI result there's a good
> >>> chance that this optimization will get rid of both the bitcheck and the
> >>> gcond.  The 'optimized' dump for the example above looks like this:
> >>>
> >>>     ;;   basic block 2
> >>>     ;;    prev block 0
> >>>     bitshift_6 = 1 << bit_5(D);
> >>>     # VUSE <.MEM_7(D)>
> >>>     _1 = arrD.4593[n_8(D)];
> >>>     _3 = _1 | bitshift_6;
> >>>     # .MEM_11 = VDEF <.MEM_7(D)>
> >>>     arrD.4593 ={v} {CLOBBER(eos)};
> >>>     # VUSE <.MEM_11>
> >>>     return _3;
> >>>
> >>> This optimization was motivated by GCC's bitmap_set_bit() before
> >>> PR119482.  We're also covering the bitclear equivalent of this opt
> >>> (check if a bit is set, if positive clear it).  The bitset
> >>> transformation only works for single bits.  The bitclear variation
> >>> can handle single or multiple bit masks.
> >>>
> >>> Bootstrapped and regression tested in x86, aarch64 and RISC-V.
> >>>
> >>>     PR tree-optimization/124667
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>     * tree-ssa-phiopt.cc (stmt_is_memory_load_assignment): helper to
> >>>     check if a gimple stmt is a memory load.
> >>>     (stmt_is_memory_store_assignment): helper to check if a gimple
> >>>     stmt is a memory store.
> >>>     (cond_removal_mispredict_validate_memregs): helper to check if a
> >>>     memory load and a memory store are using the same memory address.
> >>>     (cond_removal_mispredict_valid_bitmask): helper to validate if
> >>>     the bit/bitmask is valid for the current optimization.
> >>>     (cond_removal_mispredict_check_cond): helper to validate the
> >>>     gcond and cond_stmt format.
> >>>     (cond_removal_mispredict_memop): new cselim optimization that,
> >>>     after doing checks and validations, move a bitset/bitclear
> >>>     operation to the end of cond_bb, making it unconditional.
> >>>     (pass_cselim::execute): call cond_removal_mispredict_memop.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>     * gcc.dg/tree-ssa/pr124667.c: New test.
> >>> ---
> >>>
> >>> Changes from v1:
> >>> - simplified stmt_is_memory_load_assignment
> >>> - simplified stmt_is_memory_store_assignment
> >>> - removed vuse/vdef check, use operand_equal_p instead
> >>> - added tree_could_trap_p check
> >>> - v1 link: 
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712427.html
> >>>
> >>>    gcc/testsuite/gcc.dg/tree-ssa/pr124667.c |  77 ++++
> >>>    gcc/tree-ssa-phiopt.cc                   | 428 ++++++++++++++++++++++-
> >>>    2 files changed, 500 insertions(+), 5 deletions(-)
> >>>    create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr124667.c
> >>>
> >>>
> >>>
> >>>
> >>> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> >>> index 0bf7e58b8f0..9230e51311a 100644
> >>> --- a/gcc/tree-ssa-phiopt.cc
> >>> +++ b/gcc/tree-ssa-phiopt.cc
> >>> @@ -3115,6 +3115,421 @@ cond_store_replacement (basic_block middle_bb, 
> >>> basic_block join_bb,
> >>>      return true;
> >>>    }
> >>> +/* Return TRUE if STMT is a memory load, FALSE otherwise.  */
> >>> +
> >>> +static bool
> >>> +stmt_is_memory_load_assignment (gimple *stmt)
> >>> +{
> >>> +  if (!stmt
> >>> +      || !gimple_assign_single_p (stmt)
> >>> +      || TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME
> >>> +      || is_gimple_reg (gimple_assign_rhs1 (stmt))
> >>> +      || !gimple_references_memory_p (stmt))
> >>> +    return false;
> >>> +
> >>> +  return true;
> >>> +}
> >> Will this trigger for a GIMPLE_CALL with a return value?  That's going
> >> to be an assignment, the LHS will be an SSA_NAME, if it's an indirect
> >> call, the RHS *might* be an SSA_NAME (I'm just not sure offhand) and it
> >> should have memory operands (VUSE/VDEF) by nature of being a call and
> >> potentially writing memory.
> >
> >
> > we're checking for gimple_assign_single_p(), which in turn checks for:
> >
> >    return (is_gimple_assign (gs)
> >            && gimple_assign_rhs_class (gs) == GIMPLE_SINGLE_RHS);
> >
> >
> > "is_gimple_assign (gs)" checks for "gimple_code (gs) == GIMPLE_ASSIGN".
> >
> > So this won't trigger for GIMPLE_CALL since that would mean
> > gimple_code (gs) == GIMPLE_CALL.
> >
> >
> >
> >>
> >> This, and it's companion work by identify things to reject and assume
> >> everything is OK if it doesn't find something to reject. Would it be
> >> better to write a positive test instead.  ie, the statement must have
> >> properties x, y and z and perhaps not properties p, q and r?
> >
> > I suppose we can make something like this (untested):
> >
> > static bool
> > stmt_is_memory_load_assignment (gassign *stmt)
> > {
> >    if (!stmt)
> >      return false;
> >
> >    tree rhs1 = gimple_assign_rhs1 (stmt);
> >
> >    if (gimple_assign_single_p (stmt)
> >        && gimple_references_memory_p (stmt)
> >        && !gimple_has_volatile_ops (stmt)
> >        && (REFERENCE_CLASS_P (rhs1) || DECL_P (rhs1))
> >        && is_gimple_reg_type (TREE_TYPE (rhs1)))
> >      return true;
> >
> >    return false;
> > }
>
>
> This is all wrong :( I was reading the v1 version of the patch in my
> editor when I replied...
>
> Using the correct version now, what we can to turn it into a positive
> test:
>
> static bool
> stmt_is_memory_load_assignment (gimple *stmt)
> {
>    return stmt
>           && gimple_assign_single_p (stmt)
>           && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
>           && gimple_references_memory_p (stmt)
>           && !is_gimple_reg (gimple_assign_rhs1 (stmt)));
> }
>
>
> I'll do the same with stmt_is_memory_store_assignment().

There is already gimple_assign_load_p and gimple_store_p (which
needs && is_gimple_assign to rule out calls)

>
>
> Cheers,
> Daniel
>
> >
> > Arguably it's a bit easier to read.
> >
> >>> +
> >>> +/* cond_removal_mispredict_memop helper that checks if a
> >>> +   given memreg operand of a bitop_stmt is a memory load,
> >>> +   and it loads the same mem addr that is later stored
> >>> +   in store_stmt:
> >>> +
> >>> +   # VUSE <.MEM_11>
> >>> +   _1 = ptr_10->bits[word_num_12]; (load_stmt)
> >>> +   (...)
> >>> +   _3 = _1 OP bitmask;           (bitop_stmt)
> >>> +   # .MEM_14 = VDEF <.MEM_11>
> >>> +   ptr_10->bits[word_num_12] = _3; (store_stmt)
> >>> +
> >>> +   For the case above "_1" matches the criteria.
> >>> +
> >>> +   We're also validating whether store_stmt supports the
> >>> +   transformation by testing its LHS for read-only memory
> >>> +   and store data races.  */
> >>> +
> >>> +static bool
> >>> +cond_removal_mispredict_validate_memregs (gimple *store_stmt,
> >>> +                      tree memreg,
> >>> +                      hash_set<tree> *nontrap)
> >>> +{
> >>> +  gimple *load_stmt = SSA_NAME_DEF_STMT (memreg);
> >>> +
> >>> +  if (!operand_equal_p (gimple_assign_rhs1 (load_stmt),
> >>> +            gimple_assign_lhs (store_stmt)))
> >> Do you have to worry about default definitions here?  Essentially a
> >> default definition is not going to be a gimple assignment and thus
> >> referencing gimple_assign_* is wrong.  I think there's IS_DEFAULT_DEF or
> >> something like that you can use to guard against this issue.
> >
> > I wasn't aware of IS_DEFAULT_DEF.  I assumed that every SSA_NAME_DEF_STMT()
> > would either return a gimple assign or a NULL.
> >
> > I'll add checks for it.
> >
> >
> >>
> >> What I don't immediately see is confirmation that the stored address is
> >> safe to write to outside the context of the inner block conditional.
> >> The particular case I'd worry about would be if the address happens to
> >> be readonly in some contexts?  Does NONTRAP provide you with the
> >> necessary information to make this safe?
> >
> > We're doing read-only and trap and data race checks later in the process.
> >
> >>
> >>
> >>
> >>> +
> >>> +/* Check if a given node represents a valid bitmask for
> >>> +   the cond_removal_mispredict_memop transformation:
> >>> +   single bit mask for unconditional bit set, multiple
> >>> +   bits mask for unconditional bit clear.  */
> >>> +
> >>> +static bool
> >>> +cond_removal_mispredict_valid_bitmask (tree bitmask, bool 
> >>> only_single_bit)
> >>> +{
> >>> +  if (TREE_CODE (bitmask) == INTEGER_CST)
> >>> +    {
> >>> +      if (!only_single_bit)
> >>> +    return true;
> >>> +      return wi::popcount (wi::to_wide (bitmask)) == 1;
> >>> +    }
> >>> +
> >>> +  /* There are several ops that can generate any bitmask, but in this
> >>> +     case we want to detect "SSA_NAME = 1 << X" that represents a
> >>> +     single bit mask.  */
> >>> +  if (TREE_CODE (bitmask) == SSA_NAME)
> >>> +    {
> >>> +      gimple *def_stmt = SSA_NAME_DEF_STMT (bitmask);
> >>> +      return def_stmt
> >>> +         && gimple_assign_rhs_code (def_stmt) == LSHIFT_EXPR
> >>> +         && integer_onep (gimple_assign_rhs1 (def_stmt));
> >> Probably need to verify you have a gimple assignment statement before
> >> you start looking at the assign_rhs_code or assign_rhs1.
> >>> +  gimple *cond_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (cond));
> >> Similarly here.  You probably need to audit for this problem throughout
> >> the patch.
> >>> +
> >>> +static bool
> >>> +cond_removal_mispredict_memop (basic_block cond_bb,
> >>> +                   basic_block middle_bb,
> >>> +                   basic_block join_bb,
> >>> +                   hash_set<tree> *nontrap)
> >>> +{
> >>> +  /* 'middle_bb' must have no PHI nodes, it must come via a
> >>> +     TRUE_VALUE edge, and it must have a store preceeding
> >>> +     a bitop:
> >>> +
> >>> +     _3 = _1 BITOP bitmask;
> >>> +     # .MEM_14 = VDEF <.MEM_11>
> >>> +     ptr_10->bits[word_num_12] = _3;  */
> >>> +  if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
> >>> +    return false;
> >>> +
> >>> +  edge e_cond_middle = single_pred_edge (middle_bb);
> >>> +  if (!(e_cond_middle->flags & EDGE_TRUE_VALUE))
> >>> +    return false;
> >>> +
> >>> +  gimple_stmt_iterator gsi = gsi_last_nondebug_bb (middle_bb);
> >> [ ... ]
> >> Do you need to verify that there's nothing else in MIDDLE_BB that
> >> doesn't feed into the memory operations we're looking to optimize? ie,
> >> what if there's something like a = a + 1 in MIDDLE_BB somewhere?  That's
> >> effectively a conditional add (since MIDDLE_BB is conditiionally skipped).
> >
> > This is done a bit later here:
> >
> >
> >    /* Verify that after bitop_stmt we only have store_stmt.  */
> >    gsi_next (&gsi);
> >    if (gsi_stmt (gsi) != store_stmt)
> >      return false;
> >
> >
> > And yes, we can't do this transformation if there's additional stuff that
> > we aren't anticipating in middle_bb.
> >
> >
> >>
> >> My brain has reached its limit today.   Let's get the stuff above
> >> resolved and get an update out the door.  I suspect we're not done
> >> iterating yet.  I'm increasingly worried about the readonly memory problem.
> >
> >
> > I'll send a v3 with the appropriate SSA_NAME_DEF_STMT() guards and a more
> > readable stmt_is_memload/memstore check.
> >
> >
> > Cheers,
> > Daniel
> >
> >>
> >> jeff
> >
>

Reply via email to