On Thu, Sep 14, 2017 at 6:33 PM, Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> wrote: > This patch adds aarch64_loop_unroll_adjust to limit partial unrolling > in rtl based on strided-loads in loop.
Can you expand on this some more? Like give an example of where this helps? I am trying to better understand your counting schemes since it seems like the count is based on the number of loads and not cache lines. What do you mean by a strided load? Doesn't this function overcount when you have: for(int i = 1;i<1024;i++) { t+= a[i-1]*a[i]; } if it is counting based on cache lines rather than based on load addresses? It also seems to do some weird counting when you have: for(int i = 1;i<1024;i++) { t+= a[(i-1)*N+i]*a[(i)*N+i]; } That is: (PLUS (REG) (REG)) Also seems to overcount when loading from the same pointer twice. In my micro-arch, the number of prefetch slots is based on cache line miss so this would be overcounting by a factor of 2. Thanks, Andrew > > Thanks, > Kugan > > gcc/ChangeLog: > > 2017-09-12 Kugan Vivekanandarajah <kug...@linaro.org> > > * cfgloop.h (iv_analyze_biv): export. > * loop-iv.c: Likewise. > * config/aarch64/aarch64.c (strided_load_p): New. > (insn_has_strided_load): New. > (count_strided_load_rtl): New. > (aarch64_loop_unroll_adjust): New.