Hi, After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following regression: FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times ins\\tv0.s\\[1\\], v1.s\\[0\\] 3
This happens because for the following function from vect_copy_lane_1.c: float32x2_t __attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a, float32x2_t b) { return vcopy_lane_f32 (a, 1, b, 0); } Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90, it got lowered to following sequence in .optimized dump: <bb 2> [local count: 1073741824]: _4 = BIT_FIELD_REF <b_3(D), 32, 0>; __a_5 = BIT_INSERT_EXPR <a_2(D), _4, 32>; return __a_5; The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR to vector permutation and now thus gets lowered to: <bb 2> [local count: 1073741824]: __a_4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 2 }>; return __a_4; Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins in aarch64_expand_vec_perm_const_1, it now generates: test_copy_lane_f32: zip1 v0.2s, v0.2s, v1.2s ret Similarly for test_copy_lane_[us]32. The attached patch adjusts the tests to reflect the change in code-gen and the tests pass. OK to commit ? Thanks, Prathamesh
diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c index 2848be564d5..811dc678b92 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c +++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2) BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */ +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */ BUILD_TEST (int64x1_t, int64x1_t, , , s64, 0, 0) BUILD_TEST (uint64x1_t, uint64x1_t, , , u64, 0, 0) BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0)