On Fri, Oct 14, 2016 at 8:28 PM, Andrew Pinski <pins...@gmail.com> wrote:
> On Thu, Sep 22, 2016 at 12:10 PM, Pat Haugen
> <pthau...@linux.vnet.ibm.com> wrote:
>> I noticed the loop unroller peels an extra copy of the loop before it enters 
>> the switch block code to round the iteration count to a multiple of the 
>> unroll factor. This peeled copy is only needed for the case where the exit 
>> test is at the beginning of the loop since in that case it inserts the test 
>> for zero peel iterations before that peeled copy.
>> This patch bumps the iteration count by 1 for loops with the exit at the end 
>> so that it represents the number of times the loop body is executed, and 
>> therefore removes the need to always execute that first peeled copy. With 
>> this change, when the number of executions of the loop is an even multiple 
>> of the unroll factor then the code will jump to the unrolled loop 
>> immediately instead of executing all the switch code and peeled copies of 
>> the loop and then falling into the unrolled loop. This change also reduces 
>> code size by removing a peeled copy of the loop.
>> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?
> This patch or
> PR rtl-optimization/68212
> * cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
> frequency when computing scale factor for peeled copies.
> * loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
> values for switch/peel blocks/edges.
> Caused a ~2.7-3.5% regression in coremarks with -funroll-all-loops.

I should say on ThunderX (aarch64-linux-gnu).


> Thanks,
> Andrew
>> 2016-09-22  Pat Haugen  <pthau...@us.ibm.com>
>>         * loop-unroll.c (unroll_loop_runtime_iterations): Condition initial
>>         loop peel to loops with exit test at the beginning.

Reply via email to