https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125905
Bug ID: 125905
Summary: Improve if-conversion when true/false values are
closely related
Product: gcc
Version: 17.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: law at gcc dot gnu.org
Target Milestone: ---
Consider this code:
unsigned long
cond_mask_0 (bool flag, unsigned long mask, unsigned long target)
{
return flag ? target | mask : target & ~mask;
}
Compiled for -march=rv64gcbv_zicond we'll get something like this:
or a5,a1,a2 # 37 [c=4 l=4] *iordi3/0
andn a1,a2,a1 # 38 [c=4 l=4] and_notdi3
czero.eqz a5,a5,a0 # 41 [c=4 l=4] *czero.eqz.didi
czero.nez a0,a1,a0 # 40 [c=4 l=4] *czero.nez.didi
add a0,a5,a0 # 42 [c=4 l=4] *adddi3/0
ret # 52 [c=0 l=4] simple_return
That's not bad. But we can do better. The trick is to realize that the two
values we're selecting across are closely related.
We essentially have
result = c ? x | y : ~x & y;
And we know that x | (~x & y) == x | y
So a bit of substitution:
result = c ? x | (~x & y) : ~x & y;
And factoring
t = (~x & y);
result = c ? x | t : t
So it's just conditional ior.
andn t0, a2, a1
czero.eqz t1, t1, a0
add a0, t0, t1
So 5->3 instructions for the select. Probably not any faster on a 2+ wide
core, but still worth doing.
I haven't thought a ton about implementation details. I did confirm that we
see the form we want in noce_try_cond_arith.
Breakpoint 1, noce_try_cond_arith (if_info=0x7fffffffe3d0) at
/home/jlaw/test/gcc/gcc/ifcvt.cc:3232
3232 rtx cond = if_info->cond;
(ior:DI (reg:DI 147 [ mask ])
(reg:DI 148 [ target ]))
$8 = void
(and:DI (not:DI (reg:DI 147 [ mask ]))
(reg:DI 148 [ target ]))
$9 = void
We could try to optimize/canonicalize the arms in here or its caller. I would
expect that if it's canonicalized into a conditional IOR the right things will
just happen in ifcvt.
Note this happens in if-conversion *after* reload. So the most natural place
to clean some of this up (combine/simplify-rtx) isn't applicable.