[Bug tree-optimization/95019] Optimizer produces suboptimal code related to -ftree-ivopts

amker at gcc dot gnu.org Wed, 13 May 2020 00:10:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95019


--- Comment #3 from bin cheng <amker at gcc dot gnu.org> ---
(In reply to zhongyu...@tom.com from comment #2)
> It is a generic issue for all targets, such as x86, it also don't enpand
Yes, as said it's because SCEV currently doesn't model this, so it's not target
specific.

> IVOPTs as index is not used for DEST and Src directly. we may need expand
Yes, extending IVOPTs to handle this case (and cases from other PRs) seems
promising.
Anyway, patch is welcome, and I can do the review.

Thanks,
> IVOPTs, then different targets can select different one according their Cost
> model.
> Now, it seems ok for x86 as it have load/store insns folded the lshift
> operand, so it doesn't need separate lshift operand in loop body .
> 
> ========== base on the ARM gcc 9.2.1 on https://gcc.godbolt.org, You'll get
> separate lshift operand lsl in loop kernel, and ARM64 gcc 8.2 will use ldr  
> x3, [x1, x4, lsl 3] to avoid the separate lshift operand. so we can see all
> target dont select an IV with Step 8. 
> C00000ADA(unsigned long long, long long*, long long*):
>         push    {r4, r5, r6, r7, lr}    @
>         mov     r4, r0    @ len, tmp135
>         mov     r5, r1    @ len, tmp136
>         orrs    r1, r4, r5      @ tmp137, len
>         beq     .L1             @,
>         mov     r1, #0    @ C000005A1,
> .L3:
>         lsl     r0, r1, #3        @ _2, C000005A1,
>         add     ip, r2, r1, lsl #3        @ tmp120, Src, C000005A1,
>         ldr     lr, [r2, r0]      @ _4, *_3
>         ldr     ip, [ip, #4]      @ _4, *_3
>         umull   r6, r7, lr, lr        @ tmp125, _4, _4
>         mul     ip, lr, ip        @ tmp122, _4, tmp122
>         adds    r1, r1, r4      @ C000005A1, C000005A1, len
>         subs    r4, r4, #1      @ len, len,
>         sbc     r5, r5, #0        @ len, len,
>         add     r0, r3, r0        @ tmp121, Dest, _2
>         add     r7, r7, ip, lsl #1        @,, tmp122,
>         orrs    lr, r4, r5      @ tmp138, len
>         stm     r0, {r6-r7}       @ *_5, tmp125
>         bne     .L3             @,
> .L1:
>         pop     {r4, r5, r6, r7, lr}      @
>         bx      lr  @
> 
> Thanks for your notice.

[Bug tree-optimization/95019] Optimizer produces suboptimal code related to -ftree-ivopts

Reply via email to