On Thu, Sep 22, 2016 at 12:10 PM, Pat Haugen
> I noticed the loop unroller peels an extra copy of the loop before it enters
> the switch block code to round the iteration count to a multiple of the
> unroll factor. This peeled copy is only needed for the case where the exit
> test is at the beginning of the loop since in that case it inserts the test
> for zero peel iterations before that peeled copy.
> This patch bumps the iteration count by 1 for loops with the exit at the end
> so that it represents the number of times the loop body is executed, and
> therefore removes the need to always execute that first peeled copy. With
> this change, when the number of executions of the loop is an even multiple of
> the unroll factor then the code will jump to the unrolled loop immediately
> instead of executing all the switch code and peeled copies of the loop and
> then falling into the unrolled loop. This change also reduces code size by
> removing a peeled copy of the loop.
> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?
This patch or
* cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
frequency when computing scale factor for peeled copies.
* loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
values for switch/peel blocks/edges.
Caused a ~2.7-3.5% regression in coremarks with -funroll-all-loops.
> 2016-09-22 Pat Haugen <pthau...@us.ibm.com>
> * loop-unroll.c (unroll_loop_runtime_iterations): Condition initial
> loop peel to loops with exit test at the beginning.