Hi Maxim,

On 30/01/17 11:24, Maxim Kuvyrkov wrote:
This patch series improves -fprefetch-loop-arrays pass through small fixes and 
tweaks, and then enables it for several AArch64 cores.

My tunings were done on and for Qualcomm hardware, with results varying between 
+0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, depending on 
hardware revision.

This patch series enables restricted -fprefetch-loop-arrays at -O2, which also 
improves SPEC2006 numbers

Biggest progressions are on 419.mcf and 437.leslie3d, with no serious 
regressions on other benchmarks.

I'm now investigating making -fprefetch-loop-arrays more aggressive for 
Qualcomm hardware, which improves performance on most benchmarks, but also 
causes big regressions on 454.calculix and 462.libquantum.  If I can fix these 
two regressions, prefetching will give another boost to AArch64.

Andrew just posted similar prefetching tunings for Cavium's cores, and the two 
patches have trivial conflicts.  I'll post mine as-is, since it address one of 
the comments on Andrew's review (adding a stand-alone struct for tuning 
parameters).

Andrew, feel free to just copy-paste it to your patch, since it is just a 
mechanical change.

All patches were bootstrapped and regtested on x86_64-linux-gnu and 
aarch64-linux-gnu.

I've tried these patches out on Cortex-A72 and Cortex-A53, with the tuning 
structs entries appropriately
modified to enable the changes on those cores.
I'm seeing the mcf and leslie3d improvements as well on Cortex-A72 and 
Cortex-A53 and no noticeable regressions.
I've also verified that the improvements are due to the prefetch instructions 
rather than just the unrolling that
the pass does.
So I'm in favor of enabling this for the cores that benefit from it.

Do you plan to get this in for GCC 8?
Thanks,
Kyrill

--
Maxim Kuvyrkov
www.linaro.org




Reply via email to