https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122869

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2025-12-16
     Ever confirmed|0                           |1

--- Comment #1 from Jeffrey A. Law <law at gcc dot gnu.org> ---
-24(s0) is the location of avl
-32(s0) is the location of c

You can see those in the loop header:


        sd      zero,-32(s0)    # 5     [c=4 l=4]  *movdi_64bit/3
        li      a5,1            # 6     [c=4 l=4]  *movdi_64bit/1
        sd      a5,-24(s0)      # 7     [c=4 l=4]  *movdi_64bit/3

The loop test is:
        ld      a5,-24(s0)              # 72    [c=28 l=4]  *movdi_64bit/2
        bne     a5,zero,.L3     # 73    [c=16 l=4]  *branchdi

ie, load avl and branch if it is not zero to the top of the loop.  So while
there's a real bug in here, I don't think it's where you think it is.



Where I think things go wrong is the extraneous vsetvls.  That's going to
overwrite the VL csr.  From the body of the loop:

        vle16ff.v       v1,0(a4)
        vsetvli a4,zero,e16,mf2,tu,mu
        vsetvli a4,zero,e16,mf2,ta,ma
        vse16.v v1,0(a5)
        csrr    a4,vl


Note those two vsetvli instructions.  THose are going to overwrite the VL CSR
read a couple instructions later.

The code coming out of gimple looks somewhat sensible at first glance

  g_12 = __riscv_vle16ff_v_i16mf2 (_1, d.0_2);
  new_vl.2_13 = __riscv_read_vl ();
  MEM[(long unsigned int *)&d] = new_vl.2_13;

But the initial expansion into RTL looks bogus.  I'm not going to quote
everything here because the RTL for our vector ops is insane, but the key
points:

This is the FF load:

(insn 44 43 45 (parallel [
            (set (reg:RVVMF2HI 163)
                (if_then_else:RVVMF2HI (unspec:RVVMF32BI [
                            (const_vector:RVVMF32BI repeat [
                                    (const_int 1 [0x1])
                                ])
                            (reg:DI 135 [ d.0_2 ])
                            (const_int 2 [0x2]) repeated x2
                            (const_int 0 [0])
                            (reg:SI 66 vl)
                            (reg:SI 67 vtype)
                        ] UNSPEC_VPREDICATE)
                    (unspec:RVVMF2HI [
                            (mem:RVVMF2HI (reg/f:DI 134 [ _1 ]) [0  S[8, 8]
A16])
                        ] UNSPEC_VLEFF)
                    (unspec:RVVMF2HI [
                            (reg:DI 0 zero)
                        ] UNSPEC_VUNDEF)))
            (set (reg:SI 66 vl)
                (unspec:SI [
                        (if_then_else:RVVMF2HI (unspec:RVVMF32BI [
                                    (const_vector:RVVMF32BI repeat [
                                            (const_int 1 [0x1])
                                        ])
                                    (reg:DI 135 [ d.0_2 ])
                                    (const_int 2 [0x2]) repeated x2
                                    (const_int 0 [0])
                                    (reg:SI 66 vl)
                                    (reg:SI 67 vtype)
                                ] UNSPEC_VPREDICATE)
                            (unspec:RVVMF2HI [
                                    (mem:RVVMF2HI (reg/f:DI 134 [ _1 ]) [0 
S[8, 8] A16])
                                ] UNSPEC_VLEFF)
                            (unspec:RVVMF2HI [
                                    (reg:DI 0 zero)
                                ] UNSPEC_VUNDEF))
                    ] UNSPEC_MODIFY_VL))
        ]) "j.c":7:45 -1
     (nil))

Immediately followed by:

(insn 45 44 46 (parallel [
            (set (reg:DI 165)
                (unspec:DI [
                        (reg:DI 0 zero)
                        (const_int 16 [0x10])
                        (const_int 7 [0x7])
                        (const_int 0 [0]) repeated x2
                    ] UNSPEC_VSETVL))
            (set (reg:SI 66 vl)
                (unspec:SI [
                        (reg:DI 0 zero)
                        (const_int 16 [0x10])
                        (const_int 7 [0x7])
                    ] UNSPEC_VSETVL))
            (set (reg:SI 67 vtype)
                (unspec:SI [
                        (const_int 16 [0x10])
                        (const_int 7 [0x7])
                        (const_int 0 [0]) repeated x2
                    ] UNSPEC_VSETVL))
        ]) "j.c":7:45 -1
     (nil))

(insn 46 45 0 (set (mem/c:RVVMF2HI (reg/f:DI 157) [1 g+0 S[8, 8] A64])
        (if_then_else:RVVMF2HI (unspec:RVVMF32BI [
                    (const_vector:RVVMF32BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (reg:DI 165)
                    (const_int 2 [0x2]) repeated x2
                    (const_int 1 [0x1])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg:RVVMF2HI 163)
            (unspec:RVVMF2HI [
                    (reg:DI 0 zero)
                ] UNSPEC_VUNDEF))) "j.c":7:45 -1
     (nil))

Those are presumably storing the resulting vector.  But note how insn 45
changes VL.  Then we have:

;; new_vl.2_13 = __riscv_read_vl ();

(insn 47 46 0 (set (reg:DI 138 [ new_vl.2_13 ])
        (zero_extend:DI (reg:SI 66 vl))) -1
     (nil))



The key point is we need to read VL before we store the resulting vector.  I
haven't dove into the expansion code, but I suspect that's where we need to
focus.  Also note that with the optimizer enabled things get cleaned up in such
a way that the code works, but I suspect that's more an accident than by
design.

Reply via email to