https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122869
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2025-12-16
Ever confirmed|0 |1
--- Comment #1 from Jeffrey A. Law <law at gcc dot gnu.org> ---
-24(s0) is the location of avl
-32(s0) is the location of c
You can see those in the loop header:
sd zero,-32(s0) # 5 [c=4 l=4] *movdi_64bit/3
li a5,1 # 6 [c=4 l=4] *movdi_64bit/1
sd a5,-24(s0) # 7 [c=4 l=4] *movdi_64bit/3
The loop test is:
ld a5,-24(s0) # 72 [c=28 l=4] *movdi_64bit/2
bne a5,zero,.L3 # 73 [c=16 l=4] *branchdi
ie, load avl and branch if it is not zero to the top of the loop. So while
there's a real bug in here, I don't think it's where you think it is.
Where I think things go wrong is the extraneous vsetvls. That's going to
overwrite the VL csr. From the body of the loop:
vle16ff.v v1,0(a4)
vsetvli a4,zero,e16,mf2,tu,mu
vsetvli a4,zero,e16,mf2,ta,ma
vse16.v v1,0(a5)
csrr a4,vl
Note those two vsetvli instructions. THose are going to overwrite the VL CSR
read a couple instructions later.
The code coming out of gimple looks somewhat sensible at first glance
g_12 = __riscv_vle16ff_v_i16mf2 (_1, d.0_2);
new_vl.2_13 = __riscv_read_vl ();
MEM[(long unsigned int *)&d] = new_vl.2_13;
But the initial expansion into RTL looks bogus. I'm not going to quote
everything here because the RTL for our vector ops is insane, but the key
points:
This is the FF load:
(insn 44 43 45 (parallel [
(set (reg:RVVMF2HI 163)
(if_then_else:RVVMF2HI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 135 [ d.0_2 ])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(unspec:RVVMF2HI [
(mem:RVVMF2HI (reg/f:DI 134 [ _1 ]) [0 S[8, 8]
A16])
] UNSPEC_VLEFF)
(unspec:RVVMF2HI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF)))
(set (reg:SI 66 vl)
(unspec:SI [
(if_then_else:RVVMF2HI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 135 [ d.0_2 ])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(unspec:RVVMF2HI [
(mem:RVVMF2HI (reg/f:DI 134 [ _1 ]) [0
S[8, 8] A16])
] UNSPEC_VLEFF)
(unspec:RVVMF2HI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))
] UNSPEC_MODIFY_VL))
]) "j.c":7:45 -1
(nil))
Immediately followed by:
(insn 45 44 46 (parallel [
(set (reg:DI 165)
(unspec:DI [
(reg:DI 0 zero)
(const_int 16 [0x10])
(const_int 7 [0x7])
(const_int 0 [0]) repeated x2
] UNSPEC_VSETVL))
(set (reg:SI 66 vl)
(unspec:SI [
(reg:DI 0 zero)
(const_int 16 [0x10])
(const_int 7 [0x7])
] UNSPEC_VSETVL))
(set (reg:SI 67 vtype)
(unspec:SI [
(const_int 16 [0x10])
(const_int 7 [0x7])
(const_int 0 [0]) repeated x2
] UNSPEC_VSETVL))
]) "j.c":7:45 -1
(nil))
(insn 46 45 0 (set (mem/c:RVVMF2HI (reg/f:DI 157) [1 g+0 S[8, 8] A64])
(if_then_else:RVVMF2HI (unspec:RVVMF32BI [
(const_vector:RVVMF32BI repeat [
(const_int 1 [0x1])
])
(reg:DI 165)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(reg:RVVMF2HI 163)
(unspec:RVVMF2HI [
(reg:DI 0 zero)
] UNSPEC_VUNDEF))) "j.c":7:45 -1
(nil))
Those are presumably storing the resulting vector. But note how insn 45
changes VL. Then we have:
;; new_vl.2_13 = __riscv_read_vl ();
(insn 47 46 0 (set (reg:DI 138 [ new_vl.2_13 ])
(zero_extend:DI (reg:SI 66 vl))) -1
(nil))
The key point is we need to read VL before we store the resulting vector. I
haven't dove into the expansion code, but I suspect that's where we need to
focus. Also note that with the optimizer enabled things get cleaned up in such
a way that the code works, but I suspect that's more an accident than by
design.