on 2020/4/21 下午5:43, Richard Biener wrote: > On Tue, Apr 21, 2020 at 10:42 AM Kewen.Lin <li...@linux.ibm.com> wrote: >> >> on 2020/4/17 下午7:32, Richard Biener wrote: >>> On Fri, Apr 17, 2020 at 1:10 PM Kewen.Lin via Gcc <gcc@gcc.gnu.org> wrote: >>>> >>>> Hi all, >>>> >>>> This is one question origining from PR61837, which shows that doloop >>>> pass could introduce some loop invariants against its outer loop when >>>> doloop performs and prepares the iteration count for hardware >>>> count register assignment. >>>> >>> I suggest to try both approaches and count the number of transforms >>> done in each instance. >>> >> >> Currently, without flag_ira_loop_pressure, the regs_used estimation >> isn't good, I'd expect that invs which are hoisted first time from >> the loop should be counted as regs_used next time at regs_used >> analysis. By checking the regs_used, it's set as 2 for all loops of >> case inline_matmul_16, either C1 or C2. I think it leads the 2nd >> hoisting optimistically estimate register pressure and hoist more. >> By simple hacking by considering 1st hoisting new_reg, I can see the >> 2nd hoisting has fewer moves (57). It means the above statistics >> comparison is unfair and unreliable. > > OK, so that alone argues against doing C or D without better understanding > and fixing this. That is, when you invoke invariant motion twice at its > current place the second invocation shouldn't really do any more hoisting, > definitely not a significant amount. >
Exactly, I was expecting this before doing the data collection, the result surprised me. It looks more feasible to go with B from this perspective. >> With flag_ira_loop_pressure, the #transforms become to 255 (1st) and >> 68 (2nd), it looks better but might also need more enhancements? >> >> Since rs6000 sets flag_ira_loop_pressure at O3, I did SPEC2017 >> performance evaluation on Power8 (against baseline A) with option >> -Ofast -funroll-loops: >> * B showed 525.x264_r +1.43%, 538.imagick_r +1.23% speedup >> but 503.bwaves_r -2.74% degradation. >> * C showed 500.perlbench_r -1.31%, 520.omnetpp_r -2.20% degradation. >> >> The evaluation shows running hoisting after doloop can give us some >> benefits, but to rerun it twice isn't able to give us the similar >> gains. It looks regardless of flag_ira_loop_pressure, to rerun >> the pass requires more tweaks, probably considering those related >> parameters. If go with B, we need to figure out what we miss forbwaves_r. > > Of course it also requires benchmarking on other archs. > Yes, I'll check with -O2 which doesn't have flag_ira_loop_pressure, w/ and w/o unrolling additionally. BR, Kewen