On 10/14/2016 10:29 PM, Andrew Pinski wrote:
>>> >> This patch bumps the iteration count by 1 for loops with the exit at the
>>> >> end so that it represents the number of times the loop body is executed,
>>> >> and therefore removes the need to always execute that first peeled copy.
>>> >> With this change, when the number of executions of the loop is an even
>>> >> multiple of the unroll factor then the code will jump to the unrolled
>>> >> loop immediately instead of executing all the switch code and peeled
>>> >> copies of the loop and then falling into the unrolled loop. This change
>>> >> also reduces code size by removing a peeled copy of the loop.
>>> >> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?
>> > This patch or
>> > PR rtl-optimization/68212
>> > * cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
>> > frequency when computing scale factor for peeled copies.
>> > * loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
>> > values for switch/peel blocks/edges.
>> > Caused a ~2.7-3.5% regression in coremarks with -funroll-all-loops.
> I should say on ThunderX (aarch64-linux-gnu).
Sorry to hear about the degradation. Do you have more details on which patch
and/or what specifically causes the degradation? This patch should only affect
the execution path outside the unrolled loop (worst case is probably for loops
that execute once). The pr68212 patch is just correcting some of the block
frequency/count issues, so they're not as screwed up as what they were.