https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123951
--- Comment #2 from Austin <zhenhangwang at huawei dot com> ---
(In reply to Andrew Pinski from comment #1)
> testcase:
> ```
> #include "arm_neon.h"
>
> #define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \
> TYPE1 __attribute__((noinline,noclone)) \
> test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) \
> { \
> return vcopy##Q1##_lane##Q2##_##SUFFIX (a, INDEX1, b, INDEX2); \
> }
>
>
> BUILD_TEST (float64x2_t, float64x2_t, q, q, f64, 1, 1)
> BUILD_TEST (int64x2_t, int64x2_t, q, q, s64, 1, 1)
> BUILD_TEST (uint64x2_t, uint64x2_t, q, q, u64, 1, 1)
> /* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[1\\\]" 3
> } } */
>
> ```
>
> r14-3381-g27de9aa152141e .
> The issue is BFR indecies are "swapped".
Although V0 and V1 are caller-saved registers, this function does not pose a
problem, but it does have some impact on performance. I have just started
working on GCC development and am not very familiar with the code. If I want to
fix this issue, where should I start?