On Thu, Sep 14, 2017 at 6:33 PM, Kugan Vivekanandarajah
<[email protected]> wrote:
> This patch adds aarch64_loop_unroll_adjust to limit partial unrolling
> in rtl based on strided-loads in loop.
Can you expand on this some more? Like give an example of where this
helps? I am trying to better understand your counting schemes since
it seems like the count is based on the number of loads and not cache
lines.
What do you mean by a strided load?
Doesn't this function overcount when you have:
for(int i = 1;i<1024;i++)
{
t+= a[i-1]*a[i];
}
if it is counting based on cache lines rather than based on load addresses?
It also seems to do some weird counting when you have:
for(int i = 1;i<1024;i++)
{
t+= a[(i-1)*N+i]*a[(i)*N+i];
}
That is:
(PLUS (REG) (REG))
Also seems to overcount when loading from the same pointer twice.
In my micro-arch, the number of prefetch slots is based on cache line
miss so this would be overcounting by a factor of 2.
Thanks,
Andrew
>
> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2017-09-12 Kugan Vivekanandarajah <[email protected]>
>
> * cfgloop.h (iv_analyze_biv): export.
> * loop-iv.c: Likewise.
> * config/aarch64/aarch64.c (strided_load_p): New.
> (insn_has_strided_load): New.
> (count_strided_load_rtl): New.
> (aarch64_loop_unroll_adjust): New.