https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116109
Richard Sandiford <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
--- Comment #4 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
This isn't useful given that it isn't your target, but FWIW, we do take
reduction latencies into account when deciding the vectorisation/unrolling
factor for AArch64. E.g. the testcase produces the main loop:
.L5:
ldp q25, q24, [x1]
ldp q30, q31, [x1, 32]
add x1, x1, 64
fmla v26.2d, v25.2d, v25.2d
fmla v27.2d, v24.2d, v24.2d
fmla v28.2d, v30.2d, v30.2d
fmla v29.2d, v31.2d, v31.2d
cmp x3, x1
bne .L5
when compiled with -Ofast -mcpu=neoverse-v2. The decision is made by
aarch64_vector_costs::determine_suggested_unroll_factor, which uses issue rates
and reduction latencies recorded by the main costing code.