on 2020/4/21 下午5:43, Richard Biener wrote:
> On Tue, Apr 21, 2020 at 10:42 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>>
>> on 2020/4/17 下午7:32, Richard Biener wrote:
>>> On Fri, Apr 17, 2020 at 1:10 PM Kewen.Lin via Gcc <gcc@gcc.gnu.org> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> This is one question origining from PR61837, which shows that doloop
>>>> pass could introduce some loop invariants against its outer loop when
>>>> doloop performs and prepares the iteration count for hardware
>>>> count register assignment.
>>>>
>>> I suggest to try both approaches and count the number of transforms
>>> done in each instance.
>>>
>>
>> Currently, without flag_ira_loop_pressure, the regs_used estimation
>> isn't good, I'd expect that invs which are hoisted first time from
>> the loop should be counted as regs_used next time at regs_used
>> analysis.  By checking the regs_used, it's set as 2 for all loops of
>> case inline_matmul_16, either C1 or C2.  I think it leads the 2nd
>> hoisting optimistically estimate register pressure and hoist more.
>> By simple hacking by considering 1st hoisting new_reg, I can see the
>> 2nd hoisting has fewer moves (57).  It means the above statistics
>> comparison is unfair and unreliable.
> 
> OK, so that alone argues against doing C or D without better understanding
> and fixing this.  That is, when you invoke invariant motion twice at its
> current place the second invocation shouldn't really do any more hoisting,
> definitely not a significant amount.
> 

Exactly, I was expecting this before doing the data collection, the result
surprised me.  It looks more feasible to go with B from this perspective.

>> With flag_ira_loop_pressure, the #transforms become to 255 (1st) and
>> 68 (2nd), it looks better but might also need more enhancements?
>>
>> Since rs6000 sets flag_ira_loop_pressure at O3, I did SPEC2017
>> performance evaluation on Power8 (against baseline A) with option
>> -Ofast -funroll-loops:
>>  * B showed 525.x264_r +1.43%, 538.imagick_r +1.23% speedup
>>    but 503.bwaves_r -2.74% degradation.
>>  * C showed 500.perlbench_r -1.31%, 520.omnetpp_r -2.20% degradation.
>>
>> The evaluation shows running hoisting after doloop can give us some
>> benefits, but to rerun it twice isn't able to give us the similar
>> gains.  It looks regardless of flag_ira_loop_pressure, to rerun
>> the pass requires more tweaks, probably considering those related
>> parameters.  If go with B, we need to figure out what we miss forbwaves_r.
> 
> Of course it also requires benchmarking on other archs.
> 

Yes, I'll check with -O2 which doesn't have flag_ira_loop_pressure,
w/ and w/o unrolling additionally.


BR,
Kewen

Reply via email to