https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121959
Bug ID: 121959 Summary: riscv: vector sign extend instead of zero extend Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rdapp at gcc dot gnu.org CC: pan2.li at intel dot com Target Milestone: --- Target: riscv I haven't analyzed this in detail yet but figured I'll open a PR for tracking purposes. The following example, extracted from x264's satd, compiled with -O3 -march=rv64gcv void lul( int *restrict res, uint8_t *restrict a, uint8_t *restrict b, int n) { for (int i = 0; i < n; i++) { res[i] = (a[i] - b[i]) << 16; } } results in .L3: vsetvli a5,a3,e8,mf4,ta,ma vle8.v v1,0(a2) vle8.v v3,0(a1) slli a4,a5,2 sub a3,a3,a5 add a1,a1,a5 add a2,a2,a5 vwsubu.vv v2,v3,v1 vsetvli zero,zero,e32,m1,ta,ma vsext.vf2 v1,v2 vsll.vi v1,v1,16 vse32.v v1,0(a0) add a0,a0,a4 bne a3,zero,.L3 which is reasonable. LLVM, however produces: ... vzext.vf2 v8, v10 vsll.vi v8, v8, 16 which can be combined into vwsll (vector widening shift left). vwsll zero-extends so we cannot combine a sign-extend + left shift. Left-shifting a negative number is undefined but I'm not sure we can make use of that here. .optimized: vect__3.8_93 = .MASK_LEN_LOAD (vectp_a.6_90, 8B, { -1, ... }, _92(D), _112, 0); vect_patt_31.9_94 = (vector([4,4]) unsigned short) vect__3.8_93; vect__6.12_99 = .MASK_LEN_LOAD (vectp_b.10_96, 8B, { -1, ... }, _98(D), _112, 0); vect_patt_29.13_100 = (vector([4,4]) unsigned short) vect__6.12_99; vect_patt_27.14_101 = vect_patt_31.9_94 - vect_patt_29.13_100; vect_patt_26.15_102 = VIEW_CONVERT_EXPR<vector([4,4]) signed short>(vect_patt_27.14_101); vect_patt_22.16_103 = (vector([4,4]) int) vect_patt_26.15_102; vect__11.17_104 = vect_patt_22.16_103 << 16;