https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120439
Bug ID: 120439 Summary: RVV: wrong tail/mask-policy when source and destination overlap with different EEW Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: camel-cdr at protonmail dot com Target Milestone: --- I got suprized by the following line when reading the RVV spec today: > When source and destination registers overlap and have > different EEW, the instruction is mask- and tail-agnostic, > regardless of the setting of the vta and vma bits in > vtype. [in 30.5.2. Vector Operands] gcc doesn't seem to respect this rule: https://godbolt.org/z/oxTs6rfcv vuint8m1_t bar(vuint16m2_t v) { return __riscv_vnsrl_tu( __riscv_vreinterpret_u8m1( __riscv_vlmul_trunc_u16m1(v)), v, 3, 4); } generates vsetivli zero,4,e8,m1,tu,ma vnsrl.wi v8,v8,3 ret which may produce the wrong result, as the vnsrl.wi is masked and tail agnostic, while it should be tail undisturbed. I'm not sure if this can only be triggered by intrinsics or if codegen can also produce this wrong result.