Hi Kyrill, On 13 February 2018 at 20:47, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> wrote: > Hi Kugan, > > On 12/02/18 23:58, Kugan Vivekanandarajah wrote: >> >> Implements a machine reorg pass for aarch64/Falkor to handle >> prefetcher tag collision. This is strictly not part of the loop >> unroller but for Falkor, unrolling can make h/w prefetcher performing >> badly if there are too much tag collisions based on the discussions in >> https://gcc.gnu.org/ml/gcc/2017-10/msg00178.html. >> > > Could you expand a bit more on what transformation exactly this pass does?
This is similar to what LLVM does in https://reviews.llvm.org/D35366. Falkor hardware prefetcher works well when signature of the prefetches (or tags as computed in the patch - similar to LLVM) are different for different memory streams. If different memory streams have the same signature, it can result in bad performance. This machine reorg pass tries to change the signature of memory loads by changing the base register with a free register. > From my understanding the loads that use the same base > register and offset and have the same destination register > are considered part of the same stream by the hardware prefetcher, so for > example: > ldr x0, [x1, 16] (load1) > ... (set x1 to something else) > ldr x0, [x1, 16] (load2) > > will cause the prefetcher to think that both loads are part of the same > stream, > so this pass tries to rewrite the sequence into: > ldr x0, [x1, 16] > ... (set x1 to something else) > mov tmp, x1 > ldr x0, [tmp, 16] > > Where the tag/signature is the combination of destination x0, base x1 and > offset 16. > Is this a fair description? This is precisely what is happening. > > I've got some comments on the patch itself > >> gcc/ChangeLog: >> >> 2018-02-12 Kugan Vivekanandarajah <kug...@linaro.org> >> >> * config/aarch64/aarch64.c (iv_p): New. >> (strided_load_p): Likwise. >> (make_tag): Likesie. >> (get_load_info): Likewise. >> (aarch64_reorg): Likewise. >> (TARGET_MACHINE_DEPENDENT_REORG): Implement new target hook. > > > New functions need function comments describing the arguments at least. > Functions like make_tag, get_load_info etc can get tricky to maintain > without > some documentation on what they are supposed to accept and return. I wil add the comments. > > I think the pass should be enabled at certain optimisation levels, say -O2? > I don't think it would be desirable at -Os since it creates extra moves that > increase code size. Ok, I will change this. > > That being said, I would recommend you implement this as an aarch64-specific > pass, > in a similar way to cortex-a57-fma-steering.c. That way you can register it > in > aarch64-passes.def and have flexibility as to when exactly the pass gets to > run > (i.e. you wouldn't be limited by when machine_reorg gets run). > > Also, I suggest you don't use the "if (aarch64_tune != falkor) return;" way > of > gating this pass. Do it in a similar way to the FMA steering pass that is, > define a new flag in aarch64-tuning-flags.def and use it in the tune_flags > field > of the falkor tuning struct. Ok, I will revise the patch. Thanks, Kugan > > Hope this helps, > Kyrill