On 4/21/2026 7:30 AM, Daniel Henrique Barboza wrote:
> From: Daniel Barboza <[email protected]>
>
> Consider the following code that checks if a given bit is set, setting
> it in case it isn't:
>
> bit_val = 1 << num;
> if ((ptr[x] & bit_val) == 0)
>    {
>      ptr[x] |= bit_val;
>    }
> return ptr[x];
>
> The generated gimple is something similar to:
>
>    ;; basic block 2
>    bitshift_6 = 1 << bit_5(D);
>    # VUSE <.MEM_7(D)>
>    _1 = arrD.4593[n_8(D)];
>    _2 = _1 & bitshift_6;
>    if (_2 == 0) goto <bb 3>; else goto <bb 4>;
>
>    ;; basic block 3
>    _3 = _1 | bitshift_6;
>    # .MEM_9 = VDEF <.MEM_7(D)>
>    arrD.4593[n_8(D)] = _3;
>    ;;    succ:       4 [always]  (FALLTHRU,EXECUTABLE)
>
>    ;;   basic block 4,
>    ;;    prev block 3
>    # .MEM_4 = PHI <.MEM_7(D)(2), .MEM_9(3)>
>    # VUSE <.MEM_4>
>    _10 = arrD.4593[n_8(D)];
>    # .MEM_11 = VDEF <.MEM_4>
>    arrD.4593 ={v} {CLOBBER(eos)};
>    # VUSE <.MEM_11>
>    return _10;
>
> If we have the right conditions (e.g. we don't have store data races to
> worry about, we're not dealing with read-only memory) we can move the
> bitset operation to the cond_bb (block 2 in the example), removing the
> potential branch mispredict, as long as we're able to identify this "bit
> N is either already set or will end up being set" scenario:
>
>    bitshift_6 = 1 << bit_5(D);
>    # VUSE <.MEM_7(D)>
>    _1 = arrD.4593[n_8(D)];
>    _2 = _1 & bitshift_6;
>    _3 = _1 | bitshift_6;
>    # .MEM_9 = VDEF <.MEM_7(D)>
>    arrD.4593[n_8(D)] = _3;
>    if (_2 == 0) goto <bb 3>; else goto <bb 4>;
>
>    ;; basic block 3
>    ;;    succ:       4 [always]  (FALLTHRU,EXECUTABLE)
>    (...)
>
> If the bitcheck result isn't being used as a PHI result there's a good
> chance that this optimization will get rid of both the bitcheck and the
> gcond.  The 'optimized' dump for the example above looks like this:
>
>    ;;   basic block 2
>    ;;    prev block 0
>    bitshift_6 = 1 << bit_5(D);
>    # VUSE <.MEM_7(D)>
>    _1 = arrD.4593[n_8(D)];
>    _3 = _1 | bitshift_6;
>    # .MEM_11 = VDEF <.MEM_7(D)>
>    arrD.4593 ={v} {CLOBBER(eos)};
>    # VUSE <.MEM_11>
>    return _3;
>
> This optimization was motivated by GCC's bitmap_set_bit() before
> PR119482.  We're also covering the bitclear equivalent of this opt
> (check if a bit is set, if positive clear it).  The bitset
> transformation only works for single bits.  The bitclear variation
> can handle single or multiple bit masks.
>
> Bootstrapped and regression tested in x86, aarch64 and RISC-V.
>
>       PR tree-optimization/124667
>
> gcc/ChangeLog:
>
>       * tree-ssa-phiopt.cc (stmt_is_memory_load_assignment): helper to
>       check if a gimple stmt is a memory load.
>       (stmt_is_memory_store_assignment): helper to check if a gimple
>       stmt is a memory store.
>       (cond_removal_mispredict_validate_memregs): helper to check if a
>       memory load and a memory store are using the same memory address.
>       (cond_removal_mispredict_valid_bitmask): helper to validate if
>       the bit/bitmask is valid for the current optimization.
>       (cond_removal_mispredict_check_cond): helper to validate the
>       gcond and cond_stmt format.
>       (cond_removal_mispredict_memop): new cselim optimization that,
>       after doing checks and validations, move a bitset/bitclear
>       operation to the end of cond_bb, making it unconditional.
>       (pass_cselim::execute): call cond_removal_mispredict_memop.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.dg/tree-ssa/pr124667.c: New test.
> ---
>
> Changes from v1:
> - simplified stmt_is_memory_load_assignment
> - simplified stmt_is_memory_store_assignment
> - removed vuse/vdef check, use operand_equal_p instead
> - added tree_could_trap_p check
> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712427.html
>
>   gcc/testsuite/gcc.dg/tree-ssa/pr124667.c |  77 ++++
>   gcc/tree-ssa-phiopt.cc                   | 428 ++++++++++++++++++++++-
>   2 files changed, 500 insertions(+), 5 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr124667.c
>
>
>
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 0bf7e58b8f0..9230e51311a 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -3115,6 +3115,421 @@ cond_store_replacement (basic_block middle_bb, 
> basic_block join_bb,
>     return true;
>   }
>   
> +/* Return TRUE if STMT is a memory load, FALSE otherwise.  */
> +
> +static bool
> +stmt_is_memory_load_assignment (gimple *stmt)
> +{
> +  if (!stmt
> +      || !gimple_assign_single_p (stmt)
> +      || TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME
> +      || is_gimple_reg (gimple_assign_rhs1 (stmt))
> +      || !gimple_references_memory_p (stmt))
> +    return false;
> +
> +  return true;
> +}
Will this trigger for a GIMPLE_CALL with a return value?  That's going 
to be an assignment, the LHS will be an SSA_NAME, if it's an indirect 
call, the RHS *might* be an SSA_NAME (I'm just not sure offhand) and it 
should have memory operands (VUSE/VDEF) by nature of being a call and 
potentially writing memory.

This, and it's companion work by identify things to reject and assume 
everything is OK if it doesn't find something to reject. Would it be 
better to write a positive test instead.  ie, the statement must have 
properties x, y and z and perhaps not properties p, q and r?
> +
> +/* cond_removal_mispredict_memop helper that checks if a
> +   given memreg operand of a bitop_stmt is a memory load,
> +   and it loads the same mem addr that is later stored
> +   in store_stmt:
> +
> +   # VUSE <.MEM_11>
> +   _1 = ptr_10->bits[word_num_12]; (load_stmt)
> +   (...)
> +   _3 = _1 OP bitmask;                  (bitop_stmt)
> +   # .MEM_14 = VDEF <.MEM_11>
> +   ptr_10->bits[word_num_12] = _3; (store_stmt)
> +
> +   For the case above "_1" matches the criteria.
> +
> +   We're also validating whether store_stmt supports the
> +   transformation by testing its LHS for read-only memory
> +   and store data races.  */
> +
> +static bool
> +cond_removal_mispredict_validate_memregs (gimple *store_stmt,
> +                                       tree memreg,
> +                                       hash_set<tree> *nontrap)
> +{
> +  gimple *load_stmt = SSA_NAME_DEF_STMT (memreg);
> +
> +  if (!operand_equal_p (gimple_assign_rhs1 (load_stmt),
> +                     gimple_assign_lhs (store_stmt)))
Do you have to worry about default definitions here?  Essentially a 
default definition is not going to be a gimple assignment and thus 
referencing gimple_assign_* is wrong.  I think there's IS_DEFAULT_DEF or 
something like that you can use to guard against this issue.

What I don't immediately see is confirmation that the stored address is 
safe to write to outside the context of the inner block conditional.  
The particular case I'd worry about would be if the address happens to 
be readonly in some contexts?  Does NONTRAP provide you with the 
necessary information to make this safe?



> +
> +/* Check if a given node represents a valid bitmask for
> +   the cond_removal_mispredict_memop transformation:
> +   single bit mask for unconditional bit set, multiple
> +   bits mask for unconditional bit clear.  */
> +
> +static bool
> +cond_removal_mispredict_valid_bitmask (tree bitmask, bool only_single_bit)
> +{
> +  if (TREE_CODE (bitmask) == INTEGER_CST)
> +    {
> +      if (!only_single_bit)
> +     return true;
> +      return wi::popcount (wi::to_wide (bitmask)) == 1;
> +    }
> +
> +  /* There are several ops that can generate any bitmask, but in this
> +     case we want to detect "SSA_NAME = 1 << X" that represents a
> +     single bit mask.  */
> +  if (TREE_CODE (bitmask) == SSA_NAME)
> +    {
> +      gimple *def_stmt = SSA_NAME_DEF_STMT (bitmask);
> +      return def_stmt
> +          && gimple_assign_rhs_code (def_stmt) == LSHIFT_EXPR
> +          && integer_onep (gimple_assign_rhs1 (def_stmt));
Probably need to verify you have a gimple assignment statement before 
you start looking at the assign_rhs_code or assign_rhs1.
> +  gimple *cond_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (cond));
Similarly here.  You probably need to audit for this problem throughout 
the patch.
> +
> +static bool
> +cond_removal_mispredict_memop (basic_block cond_bb,
> +                            basic_block middle_bb,
> +                            basic_block join_bb,
> +                            hash_set<tree> *nontrap)
> +{
> +  /* 'middle_bb' must have no PHI nodes, it must come via a
> +     TRUE_VALUE edge, and it must have a store preceeding
> +     a bitop:
> +
> +     _3 = _1 BITOP bitmask;
> +     # .MEM_14 = VDEF <.MEM_11>
> +     ptr_10->bits[word_num_12] = _3;  */
> +  if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
> +    return false;
> +
> +  edge e_cond_middle = single_pred_edge (middle_bb);
> +  if (!(e_cond_middle->flags & EDGE_TRUE_VALUE))
> +    return false;
> +
> +  gimple_stmt_iterator gsi = gsi_last_nondebug_bb (middle_bb);
[ ... ]
Do you need to verify that there's nothing else in MIDDLE_BB that 
doesn't feed into the memory operations we're looking to optimize? ie, 
what if there's something like a = a + 1 in MIDDLE_BB somewhere?  That's 
effectively a conditional add (since MIDDLE_BB is conditiionally skipped).

My brain has reached its limit today.   Let's get the stuff above 
resolved and get an update out the door.  I suspect we're not done 
iterating yet.  I'm increasingly worried about the readonly memory problem.

jeff

Reply via email to