https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308

--- Comment #29 from wilco at gcc dot gnu.org ---
(In reply to Bernd Edlinger from comment #28)
> With my latest patch I bootstrapped a configuration with
> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
> --with-float=hard
> 
> I noticed a single regression in gcc.target/arm/pr53447-*.c
> 
> That is caused by disabling the adddi3 expansion.
> 
> void t0p(long long * p)
> {
>   *p += 0x100000001;
> }
> 
> used to get compiled to this at -O2:
> 
>       ldrd    r2, [r0]
>       adds    r2, r2, #1
>       adc     r3, r3, #1
>       strd    r2, [r0]
>       bx      lr
> 
> but without the adddi3 pattern I have at -O2:
> 
>       ldr     r3, [r0]
>       ldr     r1, [r0, #4]
>       cmn     r3, #1
>       add     r3, r3, #1
>       movcc   r2, #0
>       movcs   r2, #1
>       add     r1, r1, #1
>       str     r3, [r0]
>       add     r3, r2, r1
>       str     r3, [r0, #4]
>       bx      lr

That's because your patch disables adddi3 completely, which is not correct. We
want to use the existing integer sequence, just expanded earlier. Instead of
your change, removing the "&& reload_completed" from the arm_adddi3 instruction
means we expand before register allocation:

        ldr     r3, [r0]
        ldr     r2, [r0, #4]
        adds    r3, r3, #1
        str     r3, [r0]
        adc     r2, r2, #16
        str     r2, [r0, #4]
        bx      lr

> Note that also the ldrd instructions are not there.

Yes that's yet another bug...

> I think this is the effect on the ldrd that you already mentioned,
> and it gets worse when the expansion breaks the di registers up
> into two si registers.

Indeed, splitting early means we end up with 2 loads. However in most cases we
should be able to gather the loads and emit LDRD/STRD on Thumb-2 (ARM's
LDRD/STRD is far more limited so not as useful). Combine could help with
merging 2 loads/stores into a single instruction.

Reply via email to