https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116109

Richard Sandiford <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #4 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
This isn't useful given that it isn't your target, but FWIW, we do take
reduction latencies into account when deciding the vectorisation/unrolling
factor for AArch64.  E.g. the testcase produces the main loop:

.L5:
        ldp     q25, q24, [x1]
        ldp     q30, q31, [x1, 32]
        add     x1, x1, 64
        fmla    v26.2d, v25.2d, v25.2d
        fmla    v27.2d, v24.2d, v24.2d
        fmla    v28.2d, v30.2d, v30.2d
        fmla    v29.2d, v31.2d, v31.2d
        cmp     x3, x1
        bne     .L5

when compiled with -Ofast -mcpu=neoverse-v2.  The decision is made by
aarch64_vector_costs::determine_suggested_unroll_factor, which uses issue rates
and reduction latencies recorded by the main costing code.

Reply via email to