Re: [RFC][AARCH64][PATCH 5/5] add aarch64_loop_unroll_adjust to limit partial unrolling in rtl based on strided-loads in loop

Andrew Pinski Thu, 14 Sep 2017 20:37:20 -0700

On Thu, Sep 14, 2017 at 6:33 PM, Kugan Vivekanandarajah
<kugan.vivekanandara...@linaro.org> wrote:
> This patch adds aarch64_loop_unroll_adjust to limit partial unrolling
> in rtl based on strided-loads in loop.


Can you expand on this some more?  Like give an example of where this
helps?  I am trying to better understand your counting schemes since
it seems like the count is based on the number of loads and not cache
lines.

What do you mean by a strided load?
Doesn't this function overcount when you have:
for(int i = 1;i<1024;i++)
  {
    t+= a[i-1]*a[i];
  }
if it is counting based on cache lines rather than based on load addresses?

It also seems to do some weird counting when you have:
for(int i = 1;i<1024;i++)
  {
    t+= a[(i-1)*N+i]*a[(i)*N+i];
  }

That is:
(PLUS (REG) (REG))

Also seems to overcount when loading from the same pointer twice.

In my micro-arch, the number of prefetch slots is based on cache line
miss so this would be overcounting by a factor of 2.

Thanks,
Andrew

>
> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2017-09-12  Kugan Vivekanandarajah  <kug...@linaro.org>
>
>     * cfgloop.h (iv_analyze_biv): export.
>     * loop-iv.c: Likewise.
>     * config/aarch64/aarch64.c (strided_load_p): New.
>     (insn_has_strided_load): New.
>     (count_strided_load_rtl): New.
>     (aarch64_loop_unroll_adjust): New.

Re: [RFC][AARCH64][PATCH 5/5] add aarch64_loop_unroll_adjust to limit partial unrolling in rtl based on strided-loads in loop

Reply via email to