https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122277
Bug ID: 122277
Summary: Costing VF 1 instead of 4 since
r16-4411-gb6e802fd55d37e
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rdapp at gcc dot gnu.org
Target Milestone: ---
Target: riscv
Since r16-4411-gb6e802fd55d37e I'm seeing different vectorization for x264
SATD:
Source is in gcc.target/riscv/rvv/autovec/pr118019-2.c.
Built with -O3 -march=rv64gcv_zvl512b x264-pixel-satd-8x4.c
-mno-vector-strict-align
We now choose a VLA mode (RVVM1SI) for vectorizing the second loop
for (int i = 0; i < 4; i++)
{
HADAMARD4 (a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]);
sum += abs2 (a0) + abs2 (a1) + abs2 (a2) + abs2 (a3);
}
instead of V4QI.
Costing prefers RVVM1SI due to its VF={16, 16} (= 4 via estimate in
compare_inside_loop_cost) while V4QI is considered to have VF=1.
It also had VF=4 before.
Thus, we scale the V4QI loop costs by 4 and the RVVM1SI costs by 1, making it
cheaper.
I can add more later, just dumping the initial info I have available right now,
am a bit distracted today.