https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120439

            Bug ID: 120439
           Summary: RVV: wrong tail/mask-policy when source and
                    destination overlap with different EEW
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: camel-cdr at protonmail dot com
  Target Milestone: ---

I got suprized by the following line when reading the RVV spec today:

> When source and destination registers overlap and have
> different EEW, the instruction is mask- and tail-agnostic,
> regardless of the setting of the vta and vma bits in
> vtype. [in 30.5.2. Vector Operands]

gcc doesn't seem to respect this rule: https://godbolt.org/z/oxTs6rfcv


    vuint8m1_t bar(vuint16m2_t v) {
        return __riscv_vnsrl_tu(
               __riscv_vreinterpret_u8m1(
               __riscv_vlmul_trunc_u16m1(v)), v, 3, 4);
    }

generates

    vsetivli        zero,4,e8,m1,tu,ma
    vnsrl.wi        v8,v8,3
    ret

which may produce the wrong result, as the vnsrl.wi is masked and tail
agnostic, while it should be tail undisturbed.

I'm not sure if this can only be triggered by intrinsics or if codegen can also
produce this wrong result.

Reply via email to