Hi Kugan,

On 12/02/18 23:58, Kugan Vivekanandarajah wrote:
Implements a machine reorg pass for aarch64/Falkor to handle
prefetcher tag collision. This is strictly not part of the loop
unroller but for Falkor, unrolling can make h/w prefetcher performing
badly if there are too much tag collisions based on the discussions in

Could you expand a bit more on what transformation exactly this pass does?
From my understanding the loads that use the same base
register and offset and have the same destination register
are considered part of the same stream by the hardware prefetcher, so for 
ldr x0, [x1, 16] (load1)
... (set x1 to something else)
ldr x0, [x1, 16] (load2)

will cause the prefetcher to think that both loads are part of the same stream,
so this pass tries to rewrite the sequence into:
ldr x0, [x1, 16]
... (set x1 to something else)
mov tmp, x1
ldr x0, [tmp, 16]

Where the tag/signature is the combination of destination x0, base x1 and 
offset 16.
Is this a fair description?

I've got some comments on the patch itself


2018-02-12  Kugan Vivekanandarajah <kug...@linaro.org>

    * config/aarch64/aarch64.c (iv_p): New.
    (strided_load_p): Likwise.
    (make_tag): Likesie.
    (get_load_info): Likewise.
    (aarch64_reorg): Likewise.
    (TARGET_MACHINE_DEPENDENT_REORG): Implement new target hook.

New functions need function comments describing the arguments at least.
Functions like make_tag, get_load_info etc can get tricky to maintain without
some documentation on what they are supposed to accept and return.

I think the pass should be enabled at certain optimisation levels, say -O2?
I don't think it would be desirable at -Os since it creates extra moves that
increase code size.

That being said, I would recommend you implement this as an aarch64-specific 
in a similar way to cortex-a57-fma-steering.c. That way you can register it in
aarch64-passes.def and have flexibility as to when exactly the pass gets to run
(i.e. you wouldn't be limited by when machine_reorg gets run).

Also, I suggest you don't use the "if (aarch64_tune != falkor) return;" way of
gating this pass. Do it in a similar way to the FMA steering pass that is,
define a new flag in aarch64-tuning-flags.def and use it in the tune_flags field
of the falkor tuning struct.

Hope this helps,

Reply via email to