On Tue, Apr 21, 2020 at 10:42 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> on 2020/4/17 下午7:32, Richard Biener wrote:
> > On Fri, Apr 17, 2020 at 1:10 PM Kewen.Lin via Gcc <gcc@gcc.gnu.org> wrote:
> >>
> >> Hi all,
> >>
> >> This is one question origining from PR61837, which shows that doloop
> >> pass could introduce some loop invariants against its outer loop when
> >> doloop performs and prepares the iteration count for hardware
> >> count register assignment.
> >>
> > I suggest to try both approaches and count the number of transforms
> > done in each instance.
> >
> > Richard.
> >
>
> Hi Richi,
>
> Thanks for the suggestion, some statistics were collected as below.
>
> A: default
> B: move pass_rtl_move_loop_invariants after pass_rtl_doloop
> C: rerun pass_rtl_move_loop_invariants after pass_rtl_doloop
> D: C + doloop_begin
>
> Ran by bootstrapping and regression testing on ppc64le Power8 configured
> with languages c,c++,fortran,objc,obj-c++,go.
>
> Counting move #transformations in function move_invariant_reg (before
> label fail, probably better with inv == repr to filter out those
> replacements with rep, but I think the trend is similar?).
>
>         A: 802650
>         B: 841476
>         C: 803485 (C1) + 827883 (C2)
>         D: 802622 (D1) + 841476 (D2)
>
> Let's call pass_rtl_move_loop_invariants as hoisting.
> PS: C1/D1 for 1st time hoisting while C2/D2 for 2nd time hoisting.
> The small differences (~0.1%) among A/C1/D1 should be caused by noise.
>
> The numbers with twice runs (C/D) are almost two times of one time run,
> which surprised me.  By further investigation, it looks the current
> pass_rtl_move_loop_invariants has something to be improved if we want
> to rerun it.  Taking gcc/testsuite/gfortran.dg/inline_matmul_16.f90 at
> -O1 as example.  C1 does 178 transforms and C2 does 165, it's unrelated
> to unroll/doloop passes, this result isn't changed by disabling them
> explicitly.
>
> Currently, without flag_ira_loop_pressure, the regs_used estimation
> isn't good, I'd expect that invs which are hoisted first time from
> the loop should be counted as regs_used next time at regs_used
> analysis.  By checking the regs_used, it's set as 2 for all loops of
> case inline_matmul_16, either C1 or C2.  I think it leads the 2nd
> hoisting optimistically estimate register pressure and hoist more.
> By simple hacking by considering 1st hoisting new_reg, I can see the
> 2nd hoisting has fewer moves (57).  It means the above statistics
> comparison is unfair and unreliable.

OK, so that alone argues against doing C or D without better understanding
and fixing this.  That is, when you invoke invariant motion twice at its
current place the second invocation shouldn't really do any more hoisting,
definitely not a significant amount.

> With flag_ira_loop_pressure, the #transforms become to 255 (1st) and
> 68 (2nd), it looks better but might also need more enhancements?
>
> Since rs6000 sets flag_ira_loop_pressure at O3, I did SPEC2017
> performance evaluation on Power8 (against baseline A) with option
> -Ofast -funroll-loops:
>  * B showed 525.x264_r +1.43%, 538.imagick_r +1.23% speedup
>    but 503.bwaves_r -2.74% degradation.
>  * C showed 500.perlbench_r -1.31%, 520.omnetpp_r -2.20% degradation.
>
> The evaluation shows running hoisting after doloop can give us some
> benefits, but to rerun it twice isn't able to give us the similar
> gains.  It looks regardless of flag_ira_loop_pressure, to rerun
> the pass requires more tweaks, probably considering those related
> parameters.  If go with B, we need to figure out what we miss forbwaves_r.

Of course it also requires benchmarking on other archs.

> BR,
> Kewen
>

Reply via email to