https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114577

            Bug ID: 114577
           Summary: Inefficient codegen for SVE/NEON bridge
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

The following sequence:

#include <arm_neon_sve_bridge.h>

svint32_t f (int *a, int *b)
{
  int32x4_t va = vld1q_s32 (a);
  svint32_t za = svset_neonq_s32 (svundef_s32 (), va);
  return za;
}

-O2 -march=armv9-a

is expected to be a simple load but generates:

f:
        ldr     q31, [x0]
        ptrue   p3.s, vl4
        sel     z0.s, p3, z31.s, z0.s
        ret

instead of the expected (from clang):

f:                                      // @f
        ldr     q0, [x0]
        ret

it looks like GCC's implementation of svset_neonq_s32 with svundef does not
become a view_convert/subreg.

Reply via email to