https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71307

            Bug ID: 71307
           Summary: [7 Regression] Code quality regression with lane
                    extraction arm_neon.h intrinsics on aarch64
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

After r236630 I'm seeing the following testcase regress on aarch64 at -O2:

#include "arm_neon.h"

float64x1_t
test_copy_laneq_f64 (float64x1_t a, float64x2_t b)
{
  return vset_lane_f64 (vgetq_lane_f64 (b, 1), a, 0);
}

Before the commit it would generate a single:
test_copy_laneq_f64:
        dup     d0, v1.d[1]
        ret

but now on current trunk the codegen is:
test_copy_laneq_f64:
        umov    x0, v1.d[1]
        fmov    d0, x0
        ret


The optimised tree dump from the "good" case is:
test_copy_laneq_f64 (float64x1_t a, float64x2_t b)
{
  float64x1_t __vec;
  double _4;

  <bb 2>:
  __builtin_aarch64_im_lane_boundsi (16, 8, 1);
  _4 = BIT_FIELD_REF <b_2(D), 64, 64>;
  __builtin_aarch64_im_lane_boundsi (8, 8, 0);
  __vec_7 = VIEW_CONVERT_EXPR<float64x1_t>(_4);
  return __vec_7;

}

and in the "bad" case it's:
test_copy_laneq_f64 (float64x1_t a, float64x2_t b)
{
  __Float64x1_t _4;
  double _5;

  <bb 2>:
  __builtin_aarch64_im_lane_boundsi (16, 8, 1);
  _5 = BIT_FIELD_REF <b_2(D), 64, 64>;
  __builtin_aarch64_im_lane_boundsi (8, 8, 0);
  _4 = BIT_FIELD_REF <_5, 64, 0>;
  return _4;

}

Is there something that the target needs to do to handle BIT_FIELD_REFs?

Reply via email to