https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63724

            Bug ID: 63724
           Summary: [AArch64] Inefficient immediate expansion and
                    hoisting.
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ramana at gcc dot gnu.org

For some cases like hmmer in SPEC2k6 we currently generate pretty rubbish code
with AArch64. 

float
P7Viterbi(int **mmx, int L, int M, int **imx, int **dmx)
{
  int k;

  for (k = 0; k <= M; k++)
    mmx[0][k] = imx[0][k] = dmx[0][k] = -987654321;

}

This ends up generating pretty rubbish code at O2. 

tbnz    w2, #31, .L4
    ldr    x5, [x3]
    ldr    x4, [x4]
    ldr    x6, [x0]
    mov    x0, 0
.L3:
    mov    w1, 38735
    mov    w3, w1
    movk    w1, 0xc521, lsl 16
    str    w1, [x4, x0, lsl 2]
    movk    w3, 0xc521, lsl 16
    mov    w1, 38735
    str    w3, [x5, x0, lsl 2]
    movk    w1, 0xc521, lsl 16
    str    w1, [x6, x0, lsl 2]
    add    x0, x0, 1
    cmp    w2, w0
    bge    .L3
.L4:
    fmov    s0, wzr
    ret
    .size    P7Viterbi, .-P7Viterbi

and could well be 


P7Viterbi:
        tbnz    w2, #31, .L4
        ldr     x5, [x3]
        mov     w1, 38735
        ldr     x3, [x4]
        movk    w1, 0xc521, lsl 16
        ldr     x6, [x0]
        mov     x0, 0
.L3:
        str     w1, [x3, x0, lsl 2]
        str     w1, [x5, x0, lsl 2]
        str     w1, [x6, x0, lsl 2]
        add     x0, x0, 1
        cmp     w2, w0
        bge     .L3
.L4:
        fmov    s0, wzr
        ret
        .size   P7Viterbi, .-P7Viterbi

The hoisting is missed because we expand const_int's too early in the AArch64
backend. Given we don't have an "uncse" in the mid-end it's quite hard to
recover when we've expanded to this form rather early in the compiler. The
simple solution is just to move the logic out into a separate splitter
function, additionally we should also investigate what happens if we start
doing the same for our address computations, but that's the subject of a
separate patch. 

Mine.

Reply via email to