On 10/23/2025 10:43 AM, Nikita Biryukov wrote:
While investigating Zicond extension code generation on RISC-V, I identified
several cases where GCC (trunk) generates suboptimal code due to premature
if-conversion.
Consider the following test case:
CFLAGS: -march=rv64gc_zicond -mabi=lp64d -O2
int test_IOR_ceqz_x (int x, int z, int c)
{
if (c)
x = x | z;
return x;
}
Before the patch:
or a1,a0,a1
czero.eqz a1,a1,a2
czero.nez a0,a0,a2
add a0,a0,a1
ret
The issue occurs when ifcvt encounters the following RTL pattern:
(set reg1 (ior:DI (reg2:DI) (reg3:DI)))
(set reg4 (sign_extend:DI (subreg:SI (reg1:DI))))
When reg1 is no longer used, this expression could be simplified. However,
noce_convert_multiple_sets converts the block early, preventing combine from
optimizing the pattern.
This patch adds checks to bb_ok_for_noce_convert_multiple_sets to detect
such sign/zero extension patterns and reject noce_convert_multiple_sets when
combine has not yet run. This allows combine to simplify the expressions,
resulting in better code generation during the second ifcvt pass.
To minimize false positives, the additional checks only apply before the
combine pass.
Generated code for test_IOR_ceqz_x after the patch:
czero.eqz a2,a1,a2
or a0,a0,a2
ret
The patch has been bootstrapped and tested on riscv64-unknown-linux-gnu.
gcc/
* ifcvt.cc (noce_extended_and_dead_set_p): New function.
(bb_ok_for_noce_convert_multiple_sets): Use
noce_extended_and_dead_set_p.
gcc/testsuite/
* gcc.target/riscv/zicond_ifcvt_opt_int.c: New test.
So this feels fairly hackish to me. I don't think we have any data that
says this particular class of extensions will typically be eliminated.
And elimination would depend on ABI requirements as well as target
behavior. It's also the case that this is fairly sensitive to targets
were we implicitly promote to WORD_MODE because the target doesn't have
sub-word logical operations.
I suspect, but have not confirmed that combine is able to eliminate the
extension due to it realizing the two inputs are already sign extended
and the result will necessarily be sign extended already and the
explicit sign extension is redundant.
That points at another approach, specifically can we eliminate the
extension earlier, never generate it to begin with, or generate it at a
different location. fwprop seems like a potential candidate as it will
try to simplify sign extension of this object:
(subreg:SI (ior:DI (reg/v:DI 135 [ a ])
(reg/v:DI 136 [ b ])) 0)
Unfortunately num_sign_bit_copies for those objects is not returning
anything useful in this context.
I want to think about this a bit more. It really feels like we should
have a better solution than special casing this in ifcvt.
Jeff