On 5/17/2026 12:55 PM, Richard Biener wrote:
On Fri, May 15, 2026 at 11:04 PM Daniel Henrique Barboza
<[email protected]> wrote:
On 5/15/2026 2:46 PM, Daniel Henrique Barboza wrote:
On 5/9/2026 6:00 PM, Jeffrey Law wrote:
On 4/21/2026 7:30 AM, Daniel Henrique Barboza wrote:
From: Daniel Barboza <[email protected]>
Consider the following code that checks if a given bit is set, setting
it in case it isn't:
bit_val = 1 << num;
if ((ptr[x] & bit_val) == 0)
{
ptr[x] |= bit_val;
}
return ptr[x];
The generated gimple is something similar to:
;; basic block 2
bitshift_6 = 1 << bit_5(D);
# VUSE <.MEM_7(D)>
_1 = arrD.4593[n_8(D)];
_2 = _1 & bitshift_6;
if (_2 == 0) goto <bb 3>; else goto <bb 4>;
;; basic block 3
_3 = _1 | bitshift_6;
# .MEM_9 = VDEF <.MEM_7(D)>
arrD.4593[n_8(D)] = _3;
;; succ: 4 [always] (FALLTHRU,EXECUTABLE)
;; basic block 4,
;; prev block 3
# .MEM_4 = PHI <.MEM_7(D)(2), .MEM_9(3)>
# VUSE <.MEM_4>
_10 = arrD.4593[n_8(D)];
# .MEM_11 = VDEF <.MEM_4>
arrD.4593 ={v} {CLOBBER(eos)};
# VUSE <.MEM_11>
return _10;
If we have the right conditions (e.g. we don't have store data races to
worry about, we're not dealing with read-only memory) we can move the
bitset operation to the cond_bb (block 2 in the example), removing the
potential branch mispredict, as long as we're able to identify this "bit
N is either already set or will end up being set" scenario:
bitshift_6 = 1 << bit_5(D);
# VUSE <.MEM_7(D)>
_1 = arrD.4593[n_8(D)];
_2 = _1 & bitshift_6;
_3 = _1 | bitshift_6;
# .MEM_9 = VDEF <.MEM_7(D)>
arrD.4593[n_8(D)] = _3;
if (_2 == 0) goto <bb 3>; else goto <bb 4>;
;; basic block 3
;; succ: 4 [always] (FALLTHRU,EXECUTABLE)
(...)
If the bitcheck result isn't being used as a PHI result there's a good
chance that this optimization will get rid of both the bitcheck and the
gcond. The 'optimized' dump for the example above looks like this:
;; basic block 2
;; prev block 0
bitshift_6 = 1 << bit_5(D);
# VUSE <.MEM_7(D)>
_1 = arrD.4593[n_8(D)];
_3 = _1 | bitshift_6;
# .MEM_11 = VDEF <.MEM_7(D)>
arrD.4593 ={v} {CLOBBER(eos)};
# VUSE <.MEM_11>
return _3;
This optimization was motivated by GCC's bitmap_set_bit() before
PR119482. We're also covering the bitclear equivalent of this opt
(check if a bit is set, if positive clear it). The bitset
transformation only works for single bits. The bitclear variation
can handle single or multiple bit masks.
Bootstrapped and regression tested in x86, aarch64 and RISC-V.
PR tree-optimization/124667
gcc/ChangeLog:
* tree-ssa-phiopt.cc (stmt_is_memory_load_assignment): helper to
check if a gimple stmt is a memory load.
(stmt_is_memory_store_assignment): helper to check if a gimple
stmt is a memory store.
(cond_removal_mispredict_validate_memregs): helper to check if a
memory load and a memory store are using the same memory address.
(cond_removal_mispredict_valid_bitmask): helper to validate if
the bit/bitmask is valid for the current optimization.
(cond_removal_mispredict_check_cond): helper to validate the
gcond and cond_stmt format.
(cond_removal_mispredict_memop): new cselim optimization that,
after doing checks and validations, move a bitset/bitclear
operation to the end of cond_bb, making it unconditional.
(pass_cselim::execute): call cond_removal_mispredict_memop.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr124667.c: New test.
---
Changes from v1:
- simplified stmt_is_memory_load_assignment
- simplified stmt_is_memory_store_assignment
- removed vuse/vdef check, use operand_equal_p instead
- added tree_could_trap_p check
- v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712427.html
gcc/testsuite/gcc.dg/tree-ssa/pr124667.c | 77 ++++
gcc/tree-ssa-phiopt.cc | 428 ++++++++++++++++++++++-
2 files changed, 500 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr124667.c
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 0bf7e58b8f0..9230e51311a 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -3115,6 +3115,421 @@ cond_store_replacement (basic_block middle_bb,
basic_block join_bb,
return true;
}
+/* Return TRUE if STMT is a memory load, FALSE otherwise. */
+
+static bool
+stmt_is_memory_load_assignment (gimple *stmt)
+{
+ if (!stmt
+ || !gimple_assign_single_p (stmt)
+ || TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME
+ || is_gimple_reg (gimple_assign_rhs1 (stmt))
+ || !gimple_references_memory_p (stmt))
+ return false;
+
+ return true;
+}
Will this trigger for a GIMPLE_CALL with a return value? That's going
to be an assignment, the LHS will be an SSA_NAME, if it's an indirect
call, the RHS *might* be an SSA_NAME (I'm just not sure offhand) and it
should have memory operands (VUSE/VDEF) by nature of being a call and
potentially writing memory.
we're checking for gimple_assign_single_p(), which in turn checks for:
return (is_gimple_assign (gs)
&& gimple_assign_rhs_class (gs) == GIMPLE_SINGLE_RHS);
"is_gimple_assign (gs)" checks for "gimple_code (gs) == GIMPLE_ASSIGN".
So this won't trigger for GIMPLE_CALL since that would mean
gimple_code (gs) == GIMPLE_CALL.
This, and it's companion work by identify things to reject and assume
everything is OK if it doesn't find something to reject. Would it be
better to write a positive test instead. ie, the statement must have
properties x, y and z and perhaps not properties p, q and r?
I suppose we can make something like this (untested):
static bool
stmt_is_memory_load_assignment (gassign *stmt)
{
if (!stmt)
return false;
tree rhs1 = gimple_assign_rhs1 (stmt);
if (gimple_assign_single_p (stmt)
&& gimple_references_memory_p (stmt)
&& !gimple_has_volatile_ops (stmt)
&& (REFERENCE_CLASS_P (rhs1) || DECL_P (rhs1))
&& is_gimple_reg_type (TREE_TYPE (rhs1)))
return true;
return false;
}
This is all wrong :( I was reading the v1 version of the patch in my
editor when I replied...
Using the correct version now, what we can to turn it into a positive
test:
static bool
stmt_is_memory_load_assignment (gimple *stmt)
{
return stmt
&& gimple_assign_single_p (stmt)
&& TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
&& gimple_references_memory_p (stmt)
&& !is_gimple_reg (gimple_assign_rhs1 (stmt)));
}
I'll do the same with stmt_is_memory_store_assignment().
There is already gimple_assign_load_p and gimple_store_p (which
needs && is_gimple_assign to rule out calls)
gimple_assign_load_p looks interesting. ATM we're using:
static bool
stmt_is_memory_load_assignment (gimple *stmt)
{
return stmt
&& gimple_assign_single_p (stmt)
&& TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
&& gimple_references_memory_p (stmt)
&& !is_gimple_reg (gimple_assign_rhs1 (stmt));
}
gimple_assign_load_p() does:
bool
gimple_assign_load_p (const gimple *gs)
{
tree rhs;
if (!gimple_assign_single_p (gs))
return false;
rhs = gimple_assign_rhs1 (gs);
if (TREE_CODE (rhs) == WITH_SIZE_EXPR)
return true;
if (handled_component_p (rhs))
rhs = TREE_OPERAND (rhs, 0);
return (handled_component_p (rhs)
|| DECL_P (rhs)
|| TREE_CODE (rhs) == MEM_REF
|| TREE_CODE (rhs) == TARGET_MEM_REF);
}
Seems like we can do:
return stmt
&& gimple_assign_load_p (stmt)
&& TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME;
And, checking existing code that uses gimple_assign_load_p(), I don't see
checks for lhs == SSA_NAME. Maybe we don't need it here ... guess I can
remove this check and give it a try with bootstraps and see what happens.
As for the store we're already using gimple_store_p():
return stmt
&& gimple_assign_single_p (stmt)
&& gimple_store_p (stmt)
&& gimple_references_memory_p (stmt);
gimple_store_p() does:
inline bool
gimple_store_p (const gimple *gs)
{
tree lhs = gimple_get_lhs (gs);
return lhs && !is_gimple_reg (lhs);
}
So for our usage here I think we need the additional stuff to go along
with it. Seems to be in line with how other places uses gimple_store_p(),
e.g. forwprop usually couples it with gimple_assign_single_p() and other
checks.
Thanks,
Daniel
Cheers,
Daniel
Arguably it's a bit easier to read.
+
+/* cond_removal_mispredict_memop helper that checks if a
+ given memreg operand of a bitop_stmt is a memory load,
+ and it loads the same mem addr that is later stored
+ in store_stmt:
+
+ # VUSE <.MEM_11>
+ _1 = ptr_10->bits[word_num_12]; (load_stmt)
+ (...)
+ _3 = _1 OP bitmask; (bitop_stmt)
+ # .MEM_14 = VDEF <.MEM_11>
+ ptr_10->bits[word_num_12] = _3; (store_stmt)
+
+ For the case above "_1" matches the criteria.
+
+ We're also validating whether store_stmt supports the
+ transformation by testing its LHS for read-only memory
+ and store data races. */
+
+static bool
+cond_removal_mispredict_validate_memregs (gimple *store_stmt,
+ tree memreg,
+ hash_set<tree> *nontrap)
+{
+ gimple *load_stmt = SSA_NAME_DEF_STMT (memreg);
+
+ if (!operand_equal_p (gimple_assign_rhs1 (load_stmt),
+ gimple_assign_lhs (store_stmt)))
Do you have to worry about default definitions here? Essentially a
default definition is not going to be a gimple assignment and thus
referencing gimple_assign_* is wrong. I think there's IS_DEFAULT_DEF or
something like that you can use to guard against this issue.
I wasn't aware of IS_DEFAULT_DEF. I assumed that every SSA_NAME_DEF_STMT()
would either return a gimple assign or a NULL.
I'll add checks for it.
What I don't immediately see is confirmation that the stored address is
safe to write to outside the context of the inner block conditional.
The particular case I'd worry about would be if the address happens to
be readonly in some contexts? Does NONTRAP provide you with the
necessary information to make this safe?
We're doing read-only and trap and data race checks later in the process.
+
+/* Check if a given node represents a valid bitmask for
+ the cond_removal_mispredict_memop transformation:
+ single bit mask for unconditional bit set, multiple
+ bits mask for unconditional bit clear. */
+
+static bool
+cond_removal_mispredict_valid_bitmask (tree bitmask, bool only_single_bit)
+{
+ if (TREE_CODE (bitmask) == INTEGER_CST)
+ {
+ if (!only_single_bit)
+ return true;
+ return wi::popcount (wi::to_wide (bitmask)) == 1;
+ }
+
+ /* There are several ops that can generate any bitmask, but in this
+ case we want to detect "SSA_NAME = 1 << X" that represents a
+ single bit mask. */
+ if (TREE_CODE (bitmask) == SSA_NAME)
+ {
+ gimple *def_stmt = SSA_NAME_DEF_STMT (bitmask);
+ return def_stmt
+ && gimple_assign_rhs_code (def_stmt) == LSHIFT_EXPR
+ && integer_onep (gimple_assign_rhs1 (def_stmt));
Probably need to verify you have a gimple assignment statement before
you start looking at the assign_rhs_code or assign_rhs1.
+ gimple *cond_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (cond));
Similarly here. You probably need to audit for this problem throughout
the patch.
+
+static bool
+cond_removal_mispredict_memop (basic_block cond_bb,
+ basic_block middle_bb,
+ basic_block join_bb,
+ hash_set<tree> *nontrap)
+{
+ /* 'middle_bb' must have no PHI nodes, it must come via a
+ TRUE_VALUE edge, and it must have a store preceeding
+ a bitop:
+
+ _3 = _1 BITOP bitmask;
+ # .MEM_14 = VDEF <.MEM_11>
+ ptr_10->bits[word_num_12] = _3; */
+ if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
+ return false;
+
+ edge e_cond_middle = single_pred_edge (middle_bb);
+ if (!(e_cond_middle->flags & EDGE_TRUE_VALUE))
+ return false;
+
+ gimple_stmt_iterator gsi = gsi_last_nondebug_bb (middle_bb);
[ ... ]
Do you need to verify that there's nothing else in MIDDLE_BB that
doesn't feed into the memory operations we're looking to optimize? ie,
what if there's something like a = a + 1 in MIDDLE_BB somewhere? That's
effectively a conditional add (since MIDDLE_BB is conditiionally skipped).
This is done a bit later here:
/* Verify that after bitop_stmt we only have store_stmt. */
gsi_next (&gsi);
if (gsi_stmt (gsi) != store_stmt)
return false;
And yes, we can't do this transformation if there's additional stuff that
we aren't anticipating in middle_bb.
My brain has reached its limit today. Let's get the stuff above
resolved and get an update out the door. I suspect we're not done
iterating yet. I'm increasingly worried about the readonly memory problem.
I'll send a v3 with the appropriate SSA_NAME_DEF_STMT() guards and a more
readable stmt_is_memload/memstore check.
Cheers,
Daniel
jeff