On 5/7/2026 6:01 AM, Milan Tripkovic wrote:
This patch introduces a new RTL pattern for bset + sext + or and
Test cases for this pattern.
The patch does not fully resolve the bug itself, but only one
specific case of it. The pattern is written this way because
the issue only occurs with the bset + sext sequence when
the output is a 64-bit value, where the upper 32bits are lost
during the sext instruction.
Example:
dest = 0xFFFFFFFF00000001
a = 29
In the initial implementation, we get:
0xFFFFFFFF00000001 | 0x0000000020000000
which results in:
0xFFFFFFFF20000001
However, with the bset + sext sequence, the behavior is incorrect:
after bset : 0xFFFFFFFF20000001
after sext : 0x0000000020000001
So the sign extension is performed incorrectly, causing the upper 32
bits to
be discarded and leading to an incorrect final result.
The issue is that both long and int cases currently share the same RTL
in the
combine pass when our pattern matches, so at the moment I do not have
a clear
way to distinguish between them. As a result, GCC also generates bset
+ sext + or
for the int case, where it should generate only bset + sext.
Regression testing on RISC-V trunk completed with no new failures.
2026-05-07 Milan Tripkovic <[email protected]>
gcc/ChangeLog:
* config/riscv/bitmanip.md (bset_sextw_or): new pattern for bset
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr123884-c.c: New test for new pattern
So I think the first question we need to answer is whether or not this
form is an improvement. Right now we generate:
li a5,1
sllw a5,a5,a1
or a0,a0,a5
Your code generates:
bset a5,x0,a1
sext a5,a5
or a0,a0,a5
That's not any better than what we generate now. In fact, it likely
performs and encodes slightly worse. With the current code the "li" can
issue whenever the uarch wants to as it has no incoming dependencies.
You're right that the sequence I thought we could use won't work unless
we know the upper 32 bits are don't care. ie
bset a0,a0,a1
sext a0,a0
That gets state wrong for those upper 32 bits. So it's not really a
viable path.
The only way that a sequence like that work is if we know the upper 32
bits in the incoming a0 are don't care bits. That's what I was playing
with, but never got anything I was happy with. The basic idea was:
1. Enhance ext-dce to support widening subregs to indicate that upper
bits of an expression are don't care bits. Then I added patterns to the
RISC-V target to match those forms. But I never was happy with it.
jeff