https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95019
--- Comment #3 from bin cheng <amker at gcc dot gnu.org> --- (In reply to zhongyu...@tom.com from comment #2) > It is a generic issue for all targets, such as x86, it also don't enpand Yes, as said it's because SCEV currently doesn't model this, so it's not target specific. > IVOPTs as index is not used for DEST and Src directly. we may need expand Yes, extending IVOPTs to handle this case (and cases from other PRs) seems promising. Anyway, patch is welcome, and I can do the review. Thanks, > IVOPTs, then different targets can select different one according their Cost > model. > Now, it seems ok for x86 as it have load/store insns folded the lshift > operand, so it doesn't need separate lshift operand in loop body . > > ========== base on the ARM gcc 9.2.1 on https://gcc.godbolt.org, You'll get > separate lshift operand lsl in loop kernel, and ARM64 gcc 8.2 will use ldr > x3, [x1, x4, lsl 3] to avoid the separate lshift operand. so we can see all > target dont select an IV with Step 8. > C00000ADA(unsigned long long, long long*, long long*): > push {r4, r5, r6, r7, lr} @ > mov r4, r0 @ len, tmp135 > mov r5, r1 @ len, tmp136 > orrs r1, r4, r5 @ tmp137, len > beq .L1 @, > mov r1, #0 @ C000005A1, > .L3: > lsl r0, r1, #3 @ _2, C000005A1, > add ip, r2, r1, lsl #3 @ tmp120, Src, C000005A1, > ldr lr, [r2, r0] @ _4, *_3 > ldr ip, [ip, #4] @ _4, *_3 > umull r6, r7, lr, lr @ tmp125, _4, _4 > mul ip, lr, ip @ tmp122, _4, tmp122 > adds r1, r1, r4 @ C000005A1, C000005A1, len > subs r4, r4, #1 @ len, len, > sbc r5, r5, #0 @ len, len, > add r0, r3, r0 @ tmp121, Dest, _2 > add r7, r7, ip, lsl #1 @,, tmp122, > orrs lr, r4, r5 @ tmp138, len > stm r0, {r6-r7} @ *_5, tmp125 > bne .L3 @, > .L1: > pop {r4, r5, r6, r7, lr} @ > bx lr @ > > Thanks for your notice.