Hi Kugan, On 12/02/18 23:58, Kugan Vivekanandarajah wrote:
Implements a machine reorg pass for aarch64/Falkor to handle prefetcher tag collision. This is strictly not part of the loop unroller but for Falkor, unrolling can make h/w prefetcher performing badly if there are too much tag collisions based on the discussions in https://gcc.gnu.org/ml/gcc/2017-10/msg00178.html.
Could you expand a bit more on what transformation exactly this pass does? From my understanding the loads that use the same base register and offset and have the same destination register are considered part of the same stream by the hardware prefetcher, so for example: ldr x0, [x1, 16] (load1) ... (set x1 to something else) ldr x0, [x1, 16] (load2) will cause the prefetcher to think that both loads are part of the same stream, so this pass tries to rewrite the sequence into: ldr x0, [x1, 16] ... (set x1 to something else) mov tmp, x1 ldr x0, [tmp, 16] Where the tag/signature is the combination of destination x0, base x1 and offset 16. Is this a fair description? I've got some comments on the patch itself
gcc/ChangeLog: 2018-02-12 Kugan Vivekanandarajah <kug...@linaro.org> * config/aarch64/aarch64.c (iv_p): New. (strided_load_p): Likwise. (make_tag): Likesie. (get_load_info): Likewise. (aarch64_reorg): Likewise. (TARGET_MACHINE_DEPENDENT_REORG): Implement new target hook.
New functions need function comments describing the arguments at least. Functions like make_tag, get_load_info etc can get tricky to maintain without some documentation on what they are supposed to accept and return. I think the pass should be enabled at certain optimisation levels, say -O2? I don't think it would be desirable at -Os since it creates extra moves that increase code size. That being said, I would recommend you implement this as an aarch64-specific pass, in a similar way to cortex-a57-fma-steering.c. That way you can register it in aarch64-passes.def and have flexibility as to when exactly the pass gets to run (i.e. you wouldn't be limited by when machine_reorg gets run). Also, I suggest you don't use the "if (aarch64_tune != falkor) return;" way of gating this pass. Do it in a similar way to the FMA steering pass that is, define a new flag in aarch64-tuning-flags.def and use it in the tune_flags field of the falkor tuning struct. Hope this helps, Kyrill