On 5/7/2026 6:01 AM, Milan Tripkovic wrote:
This patch introduces a new RTL pattern for bset + sext + or and
Test cases for this pattern.

The patch does not fully resolve the bug itself, but only one
specific case of it. The pattern is written this way because
the issue only occurs with the bset + sext sequence when
the output is a 64-bit value, where the upper 32bits are lost
during the sext instruction.

Example:
dest = 0xFFFFFFFF00000001
a = 29

In the initial implementation, we get:
0xFFFFFFFF00000001 | 0x0000000020000000

which results in:
0xFFFFFFFF20000001

However, with the bset + sext sequence, the behavior is incorrect:
after bset : 0xFFFFFFFF20000001
after sext : 0x0000000020000001

So the sign extension is performed incorrectly, causing the upper 32 bits to
be discarded and leading to an incorrect final result.

The issue is that both long and int cases currently share the same RTL in the combine pass when our pattern matches, so at the moment I do not have a clear  way to distinguish between them. As a result, GCC also generates bset + sext + or
for the int case, where it should generate only bset + sext.

Regression testing on RISC-V trunk completed with no new failures.

2026-05-07  Milan Tripkovic  <[email protected]>

gcc/ChangeLog:

      * config/riscv/bitmanip.md (bset_sextw_or): new pattern for bset

gcc/testsuite/ChangeLog:

      * gcc.target/riscv/pr123884-c.c: New test for new pattern
So I think the first question we need to answer is whether or not this form is an improvement.  Right now we generate:

        li      a5,1
        sllw    a5,a5,a1
        or      a0,a0,a5

Your code generates:

        bset a5,x0,a1
        sext a5,a5
        or      a0,a0,a5

That's not any better than what we generate now.  In fact, it likely performs and encodes slightly worse.  With the current code the "li" can issue whenever the uarch wants to as it has no incoming dependencies.

You're right that the sequence I thought we could use won't work unless we know the upper 32 bits are don't care.  ie

    bset a0,a0,a1
   sext a0,a0

That gets state wrong for those upper 32 bits.  So it's not really a viable path.

The only way that a sequence like that work is if we know the upper 32 bits in the incoming a0 are don't care bits.  That's what I was playing with, but never got anything I was happy with.  The basic idea was:

1. Enhance ext-dce to support widening subregs to indicate that upper bits of an expression are don't care bits.  Then I added patterns to the RISC-V target to match those forms.  But I never was happy with it.

jeff

Reply via email to