https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112431
--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The trunk branch has been updated by Lehua Ding <lh...@gcc.gnu.org>: https://gcc.gnu.org/g:a23415d7572774701d7ec04664390260ab9a3f63 commit r14-6055-ga23415d7572774701d7ec04664390260ab9a3f63 Author: Juzhe-Zhong <juzhe.zh...@rivai.ai> Date: Fri Dec 1 15:00:27 2023 +0800 RISC-V: Support highpart register overlap for widen vx/vf instructions This patch leverages the same approach as vwcvt. Before this patch: .L5: add a3,s0,s1 add a4,s6,s1 add a5,s7,s1 vsetvli zero,s0,e32,m4,ta,ma vle32.v v16,0(s1) vle32.v v12,0(a3) mv s1,s2 vle32.v v8,0(a4) vle32.v v4,0(a5) nop vfwadd.vf v24,v16,fs0 vfwadd.vf v16,v12,fs0 vs8r.v v16,0(sp) -----> spill vfwadd.vf v16,v8,fs0 vfwadd.vf v8,v4,fs0 nop vsetvli zero,zero,e64,m8,ta,ma vfmv.f.s fa4,v24 vl8re64.v v24,0(sp) -----> reload vfmv.f.s fa5,v24 fcvt.lu.d a0,fa4,rtz fcvt.lu.d a1,fa5,rtz vfmv.f.s fa4,v16 vfmv.f.s fa5,v8 fcvt.lu.d a2,fa4,rtz fcvt.lu.d a3,fa5,rtz add s2,s2,s5 call sumation add s3,s3,a0 bgeu s4,s2,.L5 After this patch: .L5: add a3,s0,s1 add a4,s6,s1 add a5,s7,s1 vsetvli zero,s0,e32,m4,ta,ma vle32.v v4,0(s1) vle32.v v28,0(a3) mv s1,s2 vle32.v v20,0(a4) vle32.v v12,0(a5) vfwadd.vf v0,v4,fs0 vfwadd.vf v24,v28,fs0 vfwadd.vf v16,v20,fs0 vfwadd.vf v8,v12,fs0 vsetvli zero,zero,e64,m8,ta,ma vfmv.f.s fa4,v0 vfmv.f.s fa5,v24 fcvt.lu.d a0,fa4,rtz fcvt.lu.d a1,fa5,rtz vfmv.f.s fa4,v16 vfmv.f.s fa5,v8 fcvt.lu.d a2,fa4,rtz fcvt.lu.d a3,fa5,rtz add s2,s2,s5 call sumation add s3,s3,a0 bgeu s4,s2,.L5 PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Support highpart overlap for vx/vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-22.c: New test. * gcc.target/riscv/rvv/base/pr112431-23.c: New test. * gcc.target/riscv/rvv/base/pr112431-24.c: New test. * gcc.target/riscv/rvv/base/pr112431-25.c: New test. * gcc.target/riscv/rvv/base/pr112431-26.c: New test. * gcc.target/riscv/rvv/base/pr112431-27.c: New test.