https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71307
Bug ID: 71307 Summary: [7 Regression] Code quality regression with lane extraction arm_neon.h intrinsics on aarch64 Product: gcc Version: 7.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Target: aarch64 After r236630 I'm seeing the following testcase regress on aarch64 at -O2: #include "arm_neon.h" float64x1_t test_copy_laneq_f64 (float64x1_t a, float64x2_t b) { return vset_lane_f64 (vgetq_lane_f64 (b, 1), a, 0); } Before the commit it would generate a single: test_copy_laneq_f64: dup d0, v1.d[1] ret but now on current trunk the codegen is: test_copy_laneq_f64: umov x0, v1.d[1] fmov d0, x0 ret The optimised tree dump from the "good" case is: test_copy_laneq_f64 (float64x1_t a, float64x2_t b) { float64x1_t __vec; double _4; <bb 2>: __builtin_aarch64_im_lane_boundsi (16, 8, 1); _4 = BIT_FIELD_REF <b_2(D), 64, 64>; __builtin_aarch64_im_lane_boundsi (8, 8, 0); __vec_7 = VIEW_CONVERT_EXPR<float64x1_t>(_4); return __vec_7; } and in the "bad" case it's: test_copy_laneq_f64 (float64x1_t a, float64x2_t b) { __Float64x1_t _4; double _5; <bb 2>: __builtin_aarch64_im_lane_boundsi (16, 8, 1); _5 = BIT_FIELD_REF <b_2(D), 64, 64>; __builtin_aarch64_im_lane_boundsi (8, 8, 0); _4 = BIT_FIELD_REF <_5, 64, 0>; return _4; } Is there something that the target needs to do to handle BIT_FIELD_REFs?