https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106244

--- Comment #2 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jeff Law <[email protected]>:

https://gcc.gnu.org/g:0200c300bd8b0d364c885637482a10cb9467580c

commit r17-394-g0200c300bd8b0d364c885637482a10cb9467580c
Author: Jeff Law <[email protected]>
Date:   Thu May 7 16:52:25 2026 -0600

    [RISC-V][tree-optimization/106244] Improve code when generating (1 << N) &
0x1

    So as noted in the PR, GCC fails to optimize this well:

    int8_t f(int8_t x)
    {
        int8_t sh = 1 << x;
        return sh & 1;
    }

    I strongly suspect this kind of code is exceedingly rare in practice.  I
just
    happened to notice that it could be improved when looking for bugs to pass
    along to Shreya & Austin.  As noted in the PR, most of the time this is
cleaned
    up in gimple, but in some cases it slips through.

    I'd love to tackle this in simplify-rtx, but SHIFT_COUNT_TRUNCATED, mode
    handling for shift counts, subregs to deal with 32 bit objects on 64 bit
    targets, etc make it fairly messy.  Rather than spend a ton of time on it,
I've
    just created a simple risc-v splitter to handle the case of a 32bit shift
on
    rv64.  The other cases can't be optimized.

    For rv64 we generate:

            li      a5,1            # 7     [c=4 l=4] *movsi_internal/1
            sllw    a0,a5,a0        # 8     [c=8 l=4]  ashlsi3_extend
            andi    a0,a0,1 # 17    [c=4 l=4]  *anddi3/1

    Instead we can generate:

            andi    a0,a0,31        # 8     [c=4 l=4]  *anddi3/1
            seqz    a0,a0   # 17    [c=4 l=4]  *seq_zero_didi

    I purposefully added the masking of the shift count.  While the RISC-V port
    does not define SHIFT_COUNT_TRUNCATED, it does have patterns that optimize
away
    the the masking when they can.  If the masking got optimized away on the
    assumption the count would be used in a shift/rotate and thus masked by the
    hardware, we could have junk in the upper bits.   It's worth noting that
    because of the need to sanitize the shift count we're generating 2 insns,
thus
    we can't really improve for rv32 or for 64 bit objects on rv64.  If we
didn't
    need to do that this would be a define_insn that generated a single
    instruction.

    Bootstrapped and regression tested on rv64 for on both the BPI and the
Pioneer.
    Also regression tested on riscv32-elf and riscv64-elf.  Planning to push
once
    pre-commit CI gives the green light.

            PR tree-optimization/106244
    gcc/
            * config/riscv/riscv.md ((and (ashift X Y) const_int 1)): New
splitter.

    gcc/testsuite/
            * gcc.target/riscv/pr106244.c: New test.

Reply via email to