https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122558

            Bug ID: 122558
           Summary: RISC-V: inefficient lmul selection for 4-element
                    vector operations
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: chenzhongyao.hit at gmail dot com
                CC: juzhe.zhong at rivai dot ai, law at gcc dot gnu.org,
                    rdapp at gcc dot gnu.org
  Target Milestone: ---
            Target: riscv

https://godbolt.org/z/a9z671v48

For 4×16-bit element operations, GCC generates:
  vsetivli zero,4,e16,m2,ta,ma

This allocates 2×VLEN (256 bits on 128-bit VLEN targets) for only 64 bits of
data. On most uarch, I believe vector instructions with m2 LMUL have higher
execution overhead than m1. When the effective data size is below VLEN, using
m2 configuration is redundant and causes performance regression.

Expected Behavior: vset using mf2 or m1.

Reply via email to