https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #22 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
for me with `-fno-vect-cost-model` on without this commit we generate
https://gist.github.com/Mistuke/d9252bfcb2aa766327c5f377e162f5b7 for the loop
and with the commit well.. it doesn't fit on the screen but the codegen is
pretty horrible with

        smlal2  v24.4s, v13.8h, v5.8h
        smull   v31.4s, v30.4h, v17.4h
        add     v20.4s, v20.4s, v11.4s
        smlal2  v29.4s, v3.8h, v6.8h
        smull2  v25.4s, v25.8h, v15.8h
        add     v22.4s, v28.4s, v22.4s
        shrn    v21.4h, v21.4s, 15
        add     v20.4s, v20.4s, v26.4s
        add     v29.4s, v29.4s, v24.4s
        smlal2  v25.4s, v16.8h, v7.8h
        smlal   v31.4s, v18.4h, v8.4h
        smull2  v27.4s, v27.8h, v17.8h
        shrn2   v21.8h, v22.4s, 15
        add     v29.4s, v29.4s, v25.4s
        add     v31.4s, v31.4s, v20.4s
        smlal2  v27.4s, v18.8h, v8.8h
        str     h21, [x5, x9]
        add     x9, x9, 32
        add     x9, x5, x9
        shrn    v31.4h, v31.4s, 15
        st1     {v21.h}[1], [x10]
        add     v27.4s, v27.4s, v29.4s
        st1     {v21.h}[2], [x6]
        add     x6, x7, 20
        add     x10, x1, x21
        st1     {v21.h}[3], [x2]
        add     x2, x7, 24
        add     x7, x7, 28
        st1     {v21.h}[4], [x8]
        shrn2   v31.8h, v27.4s, 15
        st1     {v21.h}[5], [x6]
        lsl     x6, x10, 1
        add     x10, x5, x10, lsl 1
        st1     {v21.h}[6], [x2]
        add     x2, x10, 4
        st1     {v21.h}[7], [x7]
        add     x7, x10, 8
        str     h31, [x5, x6]
        add     x8, x10, 12
        lsl     x1, x1, 1
        add     x6, x6, 32
        st1     {v31.h}[1], [x2]
        add     x2, x10, 16
        st1     {v31.h}[2], [x7]
        add     x7, x10, 20
        st1     {v31.h}[3], [x8]
        add     x8, x10, 24
        add     x10, x10, 28
        st1     {v31.h}[4], [x2]
        st1     {v31.h}[5], [x7]
        add     x11, x1, 32
        st1     {v31.h}[6], [x8]
        add     x11, x0, x11
        st1     {v31.h}[7], [x10]
        add     x10, x1, x25
        ld1h    z31.s, p5/z, [x11]

going on for a while. i.e. single element lane stores. So with the cost model
disabled, it definitely does get worse witht that commit. with the cost model
on there's no difference.

Reply via email to