On Fri, Apr 17, 2020 at 1:10 PM Kewen.Lin via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi all,
>
> This is one question origining from PR61837, which shows that doloop
> pass could introduce some loop invariants against its outer loop when
> doloop performs and prepares the iteration count for hardware
> count register assignment.
>
> Even with Xionghu's simplification patch
> https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543887.html,
> we still have one zero extension un-hoisted, and possibly much
> more if we don't have any chances to simplify it.
>
> The current related pass pipeline is:
>
>     NEXT_PASS (pass_rtl_loop_init);
>     NEXT_PASS (pass_rtl_move_loop_invariants);
>     NEXT_PASS (pass_rtl_unroll_loops);
>     NEXT_PASS (pass_rtl_doloop);
>     NEXT_PASS (pass_rtl_loop_done);
>
> * How about tweaking pass order?
>
>   1. move pass_rtl_move_loop_invariants after pass_rtl_doloop?
>
>     I don't know the historical reason on this order, but one no-go
>     reason may be that it can impact following unroll, which is
>     sensitive to #insn.  Hoisting can probably reduce loop body #insn,
>     easy to break existing things without it there?

It's probably historical up to the point where we did not have the
SSA infrastructure and optimization passes.  I'd indeed be more
worried about invariants created later.

>   2. move pass_rtl_doloop before pass_rtl_move_loop_invariants?
>
>     It looks impractical, pass_rtl_unroll_loops can update the loop
>     niter which is required by doloop preparation, predicting the value
>     correctly sounds impossible.
>
> * rerun hoisting pass?
>
>   3. run pass_rtl_move_loop_invariants again after pass_rtl_doloop?
>
>     It looks practical? But currently on Power it can't hoist all the
>     invariants, it's caused that the pseudo used for count register which
>     isn't taken as "invariant" by pass_rtl_move_loop_invariants, since
>     we still leave the doloop end pattern inside the loop, for example:
>
>       67: r143:SI=r119:DI#0-0x1
>       68: r142:DI=zero_extend(r143:SI)
>       69: r141:DI=r142:DI+0x1            // count
>       ...
>       66: {pc={(r141:DI!=0x1)?L64:pc};r141:DI=r141:DI-0x1;
>            clobber scratch;clobber scratch;}   // doloop_end
>
>     though the doloop_end will be eliminated on Power eventually,
>
>     I noticed that there is one hook doloop_begin, which looks to help
>     ports to introduce loop count pseudo.  By supporting this for rs6000,
>     I can obtain the insns what I expected on Power.
>
>         Original:
>            12: NOTE_INSN_BASIC_BLOCK 4
>            67: %9:SI=%5:SI-0x1           // invariant
>            14: %8:DI=%4:DI-0x1
>            68: %9:DI=zero_extend(%9:SI)  // invariant
>            15: %10:DI=%3:DI
>            69: %9:DI=%9:DI+0x1           // invariant
>            82: ctr:DI=%9:DI
>               REG_DEAD %9:DI
>
>         With rerun hoisting:
>
>            12: NOTE_INSN_BASIC_BLOCK 4
>            69: %8:DI=%5:DI+0x1          // invariant
>            14: %10:DI=%4:DI-0x1
>            84: ctr:DI=%8:DI
>               REG_DEAD %8:DI
>            15: %9:DI=%3:DI
>
>         With rerun hoisting + doloop_begin:
>
>            12: NOTE_INSN_BASIC_BLOCK 4
>            70: ctr:DI=%5:DI
>            14: %10:DI=%4:DI-0x1
>            15: %9:DI=%3:DI
>
>     But I still had the concern that I'm not sure the current hoisting pass
>     will consider register pressure, if the niter related invariants
>     have some derived uses inside the loop, probably creating more
>     register pressure here.
>
> Attached is the initial patch for this naive proposal, I'd like to post it
> first to see whether it's reasonable to proceed.  If it's reasonable, one
> thing to be improved can be to guard it for only the related outer loops
> of which doloop perform succesfully on to save compilation time.
>
> Welcome any comments!

I suggest to try both approaches and count the number of transforms
done in each instance.

Richard.

>
> BR,
> Kewen

Reply via email to