https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119348

            Bug ID: 119348
           Summary: risc-v vector tuple casting optimization regression
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: shuizhuyuanluo at gmail dot com
  Target Milestone: ---

because risc-v vector intrinsics does not provide a direct method to convert a
tuple to a vector group, I implemented the following utility function

It should be a no-op on compatible register layouts.

gcc 14.2 with -O2 or -O3 generates lots of vmv1r.v instructions
gcc 14.2 with -Os or -Oz generates a single ret instruction
clang 18/19/20 generates a single ret instruction

however, gcc master branch generates lots of vmv1r.v instructions no matter O2
O3 or Os Oz is used

riscv64-unknown-linux-gnu-gcc -march=rv64gcv -Os opt.c -c -S -o opt-os.s

this issue can alse be reproduced on godbolt.org

```c
#include <riscv_vector.h>

vfloat32m8_t convert_vfloat32m1x8_to_vfloat32m8(vfloat32m1x8_t tuple)
{
    vfloat32m1_t v0 = __riscv_vget_v_f32m1x8_f32m1(tuple, 0);
    vfloat32m1_t v1 = __riscv_vget_v_f32m1x8_f32m1(tuple, 1);
    vfloat32m1_t v2 = __riscv_vget_v_f32m1x8_f32m1(tuple, 2);
    vfloat32m1_t v3 = __riscv_vget_v_f32m1x8_f32m1(tuple, 3);
    vfloat32m1_t v4 = __riscv_vget_v_f32m1x8_f32m1(tuple, 4);
    vfloat32m1_t v5 = __riscv_vget_v_f32m1x8_f32m1(tuple, 5);
    vfloat32m1_t v6 = __riscv_vget_v_f32m1x8_f32m1(tuple, 6);
    vfloat32m1_t v7 = __riscv_vget_v_f32m1x8_f32m1(tuple, 7);

    vfloat32m8_t result = __riscv_vundefined_f32m8();
    result = __riscv_vset_v_f32m1_f32m8(result, 0, v0);
    result = __riscv_vset_v_f32m1_f32m8(result, 1, v1);
    result = __riscv_vset_v_f32m1_f32m8(result, 2, v2);
    result = __riscv_vset_v_f32m1_f32m8(result, 3, v3);
    result = __riscv_vset_v_f32m1_f32m8(result, 4, v4);
    result = __riscv_vset_v_f32m1_f32m8(result, 5, v5);
    result = __riscv_vset_v_f32m1_f32m8(result, 6, v6);
    result = __riscv_vset_v_f32m1_f32m8(result, 7, v7);
    return result;
}
```

Reply via email to