https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124448

            Bug ID: 124448
           Summary: RISC-V: Extract from tuple vector  with lmul<1 may
                    cause bad code generation
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wangzicong at masscore dot cn
  Target Milestone: ---

https://godbolt.org/z/711rb6jPj
In this case, with -mrvv-vector-bits=zvl, vectorization picks up RVVMF4QI. Then
there are some unnecessary extract operations. 

https://godbolt.org/z/PjfaM57sb
when -mrvv-vector-bits=scalable, the vector mode is RVVM2HI, which is more
efficient. Seemingly it is related to vector cost, the cost of RVVMF4QI is
slightly lower than RVVM2HI when zvl is specified. This is actually due to the
difference between full and partial vectors.

But I believe it is essentially about extraction from tuple vector when lmul <
1. With vl = 4, the conversion from RVVMF2x4HI to four RVVMF2HI subregs takes
an extra extract_first and then slide operations.

RTL for extraction from RVVMF2x4HI:

;; vect_patt_365.74_467 = VIEW_CONVERT_EXPR<vector(4) unsigned
short>(vect__5.53_448);

(insn 8 7 9 (set (reg:DI 181)
        (unspec:DI [
                (vec_select:DI (subreg:RVVM2DI (reg:RVVMF2x4HI 176 [
vect_array.51D.3032 ]) 0)
                    (parallel [
                            (const_int 0 [0])
                        ]))
                (reg:SI 67 vtype)
            ] UNSPEC_VPREDICATE)) "dct.c":8:20 -1
     (nil))

(insn 9 8 0 (set (reg:RVVMF2HI 164 [ vect_patt_365.74D.3056 ])
        (subreg:RVVMF2HI (reg:DI 181) 0)) -1
     (nil))

;; vect_patt_370.80_473 = VIEW_CONVERT_EXPR<vector(4) unsigned
short>(vect__5.55_449);

(insn 10 9 11 (set (reg:DI 184)
        (unspec:DI [
                (const_int 32 [0x20])
            ] UNSPEC_VLMAX)) "dct.c":8:20 -1
     (nil))

(insn 11 10 12 (set (reg:RVVM2DI 183)
        (unspec:RVVM2DI [
                (unspec:RVVMF32BI [
                        (const_vector:RVVMF32BI [
                                (const_int 1 [0x1]) repeated x4
                            ])
                        (reg:DI 184)
                        (const_int 2 [0x2]) repeated x2
                        (const_int 1 [0x1])
                        (reg:SI 66 vl)
                        (reg:SI 67 vtype)
                    ] UNSPEC_VPREDICATE)
                (unspec:RVVM2DI [
                        (reg:DI 0 zero)
                    ] UNSPEC_VUNDEF)
                (subreg:RVVM2DI (reg:RVVMF2x4HI 176 [ vect_array.51D.3032 ]) 0)
                (const_int 1 [0x1])
            ] UNSPEC_VSLIDEDOWN)) "dct.c":8:20 -1
     (nil))

(insn 12 11 13 (set (reg:DI 182)
        (unspec:DI [
                (vec_select:DI (reg:RVVM2DI 183)
                    (parallel [
                            (const_int 0 [0])
                        ]))
                (reg:SI 67 vtype)
            ] UNSPEC_VPREDICATE)) "dct.c":8:20 -1
     (nil))

(insn 13 12 0 (set (reg:RVVMF2HI 167 [ vect_patt_370.80D.3062 ])
        (subreg:RVVMF2HI (reg:DI 182) 0)) -1
     (nil))

RTL for extraction from RVVM2x4HI:
;; vect_patt_401.74_702 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned
short>(vect__5.53_683);

(insn 8 7 0 (set (reg:RVVM2HI 164 [ vect_patt_401.74D.3056 ])
        (subreg:RVVM2HI (reg:RVVM2x4HI 176 [ vect_array.51D.3032 ]) 0)) -1
     (nil))

;; vect_patt_406.80_708 = VIEW_CONVERT_EXPR<vector([16,16]) unsigned
short>(vect__5.55_684);

(insn 9 8 0 (set (reg:RVVM2HI 167 [ vect_patt_406.80D.3062 ])
        (subreg:RVVM2HI (reg:RVVM2x4HI 176 [ vect_array.51D.3032 ]) [32, 32]))
-1
     (nil))

Is that possible to directly connect a RVVMF2HI subreg to one of the elements
in RVVMF2x4HI?
  • [Bug target/124448] New: RISC-V... wangzicong at masscore dot cn via Gcc-bugs

Reply via email to