This patch tweaks the i386 back-end's ix86_split_ashl to implement
doubleword left shifts by 1 bit, using an add followed by an add-with-carry
(i.e. a doubleword x+x) instead of using the x86's shld instruction.
The replacement sequence both requires fewer bytes and is faster on
both Intel and AMD architectures (from Agner Fog's latency tables and
confirmed by my own microbenchmarking).

For the test case:
__int128 foo(__int128 x) { return x << 1; }

with -O2 we previously generated:

foo:    movq    %rdi, %rax
        movq    %rsi, %rdx
        shldq   $1, %rdi, %rdx
        addq    %rdi, %rax
        ret

with this patch we now generate:

foo:    movq    %rdi, %rax
        movq    %rsi, %rdx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
        ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-05  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
        * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by
        one into add3_cc_overflow_1 followed by add3_carry.
        * config/i386/i386.md (@add<mode>3_cc_overflow_1): Renamed from
        "*add<mode>3_cc_overflow_1" to provide generator function.

gcc/testsuite/ChangeLog
        * gcc.target/i386/ashldi3-2.c: New 32-bit test case.
        * gcc.target/i386/ashlti3-3.c: New 64-bit test case.


Thanks in advance,
Roger
--


Reply via email to