https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122558
Bug ID: 122558
Summary: RISC-V: inefficient lmul selection for 4-element
vector operations
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: chenzhongyao.hit at gmail dot com
CC: juzhe.zhong at rivai dot ai, law at gcc dot gnu.org,
rdapp at gcc dot gnu.org
Target Milestone: ---
Target: riscv
https://godbolt.org/z/a9z671v48
For 4×16-bit element operations, GCC generates:
vsetivli zero,4,e16,m2,ta,ma
This allocates 2×VLEN (256 bits on 128-bit VLEN targets) for only 64 bits of
data. On most uarch, I believe vector instructions with m2 LMUL have higher
execution overhead than m1. When the effective data size is below VLEN, using
m2 configuration is redundant and causes performance regression.
Expected Behavior: vset using mf2 or m1.