https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121136
--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jeff Law <[email protected]>: https://gcc.gnu.org/g:5e93ab0da7a700371a72683ea6c7a1fd2bc23907 commit r16-5060-g5e93ab0da7a700371a72683ea6c7a1fd2bc23907 Author: Your Name <[email protected]> Date: Thu Nov 6 09:50:22 2025 -0700 [RISC-V][PR 121136] Improve various tests which only need to examine upper bits in a GPR So pre-commit CI flagged an issue with the initial version of this patch. In particular the cmp-mem-const-{1,2} tests are failing. I didn't see that in my internal testing, but that well could be an artifact of having multiple patches touching in the same broad space that the tester is evaluating. If I apply just this patch I can trigger the cmp-mem-const{1,2} failures. The code we're getting now is actually better than we were getting before, but the new patterns avoid the path through combine that emits the message about narrowing the load down to a byte load, hence the failure. Given we're getting better code now than before, I'm just skipping this test on risc-v. That's the only non-whitespace change since the original version of this patch. -- This addresses the first level issues seen in generating better performing code for testcases derived from pr121136. It likely regresses code size in some cases as in many cases it selects code sequences that should be better performing, though larger to encode. Improving -Os code generation should remain the primary focus of pr121136. Any improvements in code size with this change are a nice side effect, but not the primary goal. -- Let's take this test (derived from the PR): _Bool func1_0x1U (unsigned int x) { return x <= 0x1U; } _Bool func2_0x1U (unsigned int x) { return ((x >> __builtin_ctz (0x1U + 1U)) == 0); } _Bool func3_0x1U (unsigned int x) { return ((x / (0x1U + 1U)) == 0); } Those should produce the same output. We currently get these fragments for the 3 cases. In particular note how the second variant is a two instruction sequence. sltiu a0,a0,2 srliw a0,a0,1 seqz a0,a0 sltiu a0,a0,2 This patch will adjust that second sequence to match the first and third and is optimal. Let's take another case. This is interesting as it's right at the simm12 border: _Bool func1_0x7ffU (unsigned long x) { return x <= 0x7ffU; } _Bool func2_0x7ffU (unsigned long x) { return ((x >> __builtin_ctzl (0x7ffU + 1UL)) == 0); } _Bool func3_0x7ffU (unsigned long x) { return ((x / (0x7ffU + 1UL)) == 0); } We get: li a5,2047 sltu a0,a5,a0 seqz a0,a0 srli a0,a0,11 seqz a0,a0 li a5,2047 sltu a0,a5,a0 seqz a0,a0 In this case the second sequence is pretty good. Not perfect, but clearly better than the other two. This patch will fix the code for case #1 and case So anyway, that's the basic motivation here. So to be 100% clear, while the bug is focused on code size, I'm focused on the performance of the resulting code. This has been tested on riscv32-elf and riscv64-elf. It's also bootstrapped and regression tested on the Pioneer. The BPI won't have results for this patch until late tomorrow. -- PR rtl-optimization/121136 gcc/ * config/riscv/riscv.md: Add define_insn to test the upper bits of a register against zero using sltiu when the bits are extracted via zero_extract or logial right shift. Add 3->2 define_splits for gtu/leu cases testing upper bits against zero. gcc/testsuite * gcc.target/riscv/pr121136.c: New test. * gcc.dg/cmp-mem-const-1.c: Skip for risc-v. * gcc.dg/cmp-mem-const-2.c: Likewise.
