On 10/28/23 07:05, Roger Sayle wrote:
This patch improves the code generated for X << 1 (and for X + X) when
X is 64-bit DImode, using the same two instruction code sequence used
for DImode addition.
For the test case:
long long foo(long long x) { return x << 1; }
GCC -O2 currently generates the following code:
foo: lsr r2,r0,31
asl_s r1,r1,1
asl_s r0,r0,1
j_s.d [blink]
or_s r1,r1,r2
and on CPU without a barrel shifter, i.e. -mcpu=em
foo: add.f 0,r0,r0
asl_s r1,r1
rlc r2,0
asl_s r0,r0
j_s.d [blink]
or_s r1,r1,r2
with this patch (both with and without a barrel shifter):
foo: add.f r0,r0,r0
j_s.d [blink]
adc r1,r1,r1
[For Jeff Law's benefit a similar optimization is also applicable to
H8300H, that could also use a two instruction sequence (plus rts) but
currently GCC generates 16 instructions (plus an rts) for foo above.]
Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?
WRT H8. Bug filed so we don't lose track of it. We don't have DImode
operations defined on the H8. First step would be DImode loads/stores
and basic arithmetic.
Jeff