https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378
Bug ID: 120378 Summary: Support narrowing clip idiom Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: rdapp at gcc dot gnu.org Target Milestone: --- Target: riscv x264 contains a variation of the following loop (in hpel_filter): typedef unsigned char uint8_t; typedef short int16_t; inline uint8_t x264_clip_uint8 (int x) { return x & (~255) ? (-x) >> 31 : x; } void __attribute__ ((noipa)) x264_clip_loop (uint8_t *res, int *x, int w) { for (int i = 0; i < w; i++) res[i] = x264_clip_uint8 (x[i]); } Currently we generate: .L4: vsetvli a5,a2,e32,m1,ta,mu vle32.v v1,0(a1) sub a2,a2,a5 sh2add a1,a5,a1 vmsgtu.vv v0,v1,v3 vrsub.vi v2,v1,0 vsra.vi v1,v2,31,v0.t vsetvli zero,zero,e16,mf2,ta,ma vnsrl.wi v1,v1,0 vsetvli zero,zero,e8,mf4,ta,ma vnsrl.wi v1,v1,0 vse8.v v1,0(a0) add a0,a0,a5 bne a2,zero,.L4 That's a literal vectorization of the code and not bad, however clang does a bit better here by making use of vnclipu: .LBB0_13: # =>This Inner Loop Header: Depth=1 vl2re32.v v8, (a5) vsetvli a3, zero, e32, m2, ta, ma vmax.vx v8, v8, zero vsetvli zero, zero, e16, m1, ta, ma vnclipu.wi v10, v8, 0 vsetvli zero, zero, e8, mf2, ta, ma vnclipu.wi v8, v10, 0 vse8.v v8, (a4) add a5, a5, t0 add a4, a4, a7 bne a4, t1, .LBB0_13 The ifcvt'ed code before vect is: _4 = *_3; x.0_12 = (unsigned int) _4; _38 = -x.0_12; _15 = (int) _38; _16 = _15 >> 31; _29 = x.0_12 > 255; _17 = _29 ? _16 : _4; _18 = (unsigned char) _17; I guess that's a case for match.pd and vect patterns. I'm just not sure yet how to properly recognize the idiom as we need to ensure that _15's sign-bit is set.