[RFC][PATCH 0/5] Loop unrolling and memory load streams

Kugan Vivekanandarajah Thu, 14 Sep 2017 18:24:58 -0700

While loop unrolling helps to keep the pipeline busy in modern
processors, it also can increase the memory streams resulting in
collisions for the hardware prefetcher that can impact performance.
This patch series tries to detect this and limit the loop unrolling.


Patch 1 : Add separate parms for rtl unroller:

Patch2: Add number of hw prefetchers available to cpu_prefetch_tune so it can
be used in loop unrolling decisions:

Patch3: Prevent tree unroller from completely unrolling inner loops if that
results in excessive strided-loads in outer loop:

Patch4: Change iv_analyze_result to take const_rtx. This is just to make the
next patch compile. No functional changes:

Patch5: add aarch64_loop_unroll_adjust to limit partial unrolling in rtl
based on strided-loads in loop:

Bootstrapped and tested on aarch64-linux-gnu (with
–funroll-all-loops). Testing on x86_64-linux-gnu ongoing.

Thanks,
Kugan

[RFC][PATCH 0/5] Loop unrolling and memory load streams

Reply via email to