https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122408

            Bug ID: 122408
           Summary: [Aarch64] Wrong auto-vectorization for complex AXPY
                    with conjugation (CONJG)
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gcc-bugzilla at slmaertens dot dev
  Target Milestone: ---

Created attachment 62626
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62626&action=edit
Reproducer for auto-vectorization bug.

Since gfortran 15.2.0, the compiler auto-vectorizes the following loop by using
the fcmla instruction:

  DO I = 1, LASTC
    C( 1, I ) = C( 1, I ) - TAU * DCONJG( WORK( I ) )
  END DO


Without parentheses around DCONJG:

  L20:
        movi    v6.4s, 0
        ldr     q7, [x2], 16
        ldr     q16, [x19]
        fcmla   v6.2d, v18.2d, v7.2d, #0
        fcmla   v6.2d, v18.2d, v7.2d, #90
        fsub    v6.2d, v16.2d, v6.2d
        str     q6, [x19]
        add     x19, x19, x8
        cmp     x2, x5
        bne     L20


With parentheses around DCONJG:

  L20:
        ldp     d6, d5, [x0], 16
        ldp     d16, d7, [x19, 16]
        fmul    d4, d17, d6
        fnmul   d3, d5, d17
        fmsub   d4, d5, d18, d4
        fnmsub  d3, d18, d6, d3
        fsub    d4, d7, d4
        fsub    d3, d16, d3
        stp     d3, d4, [x19, 16]
        add     x19, x19, x8
        cmp     x0, x5
        bne     L20

The auto-vectorized code misses the complex conjugation. 

[email protected] already figured out what is going on here:
https://github.com/iains/gcc-darwin-arm64/issues/148#issuecomment-3438953784

This should come up on all aarch64 machines, but I can currently reproduce it
only on macOS. The reproducer is attached (compile with -O2).

Reply via email to