https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123951

--- Comment #2 from Austin <zhenhangwang at huawei dot com> ---
(In reply to Andrew Pinski from comment #1)
> testcase:
> ```
> #include "arm_neon.h"
> 
> #define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2)        \
> TYPE1 __attribute__((noinline,noclone))                                 \
> test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b)                  \
> {                                                                       \
>   return vcopy##Q1##_lane##Q2##_##SUFFIX (a, INDEX1, b, INDEX2);        \
> }
> 
> 
> BUILD_TEST (float64x2_t, float64x2_t, q, q, f64, 1, 1)
> BUILD_TEST (int64x2_t,   int64x2_t,   q,  q, s64, 1, 1)
> BUILD_TEST (uint64x2_t,  uint64x2_t,  q, q, u64, 1, 1)
> /* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[1\\\]" 3
> } } */
> 
> ```
> 
> r14-3381-g27de9aa152141e .
> The issue is BFR indecies are "swapped".

Although V0 and V1 are caller-saved registers, this function does not pose a
problem, but it does have some impact on performance. I have just started
working on GCC development and am not very familiar with the code. If I want to
fix this issue, where should I start?

Reply via email to