https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114429
Bug ID: 114429 Summary: [x86] (neg a) ashifrt>> 31 can be optimized to a > 0. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- typedef unsigned char uint8_t; uint8_t x264_clip_uint8( int x ) { return x&(~255) ? (-x)>>31 : x; } void foo (int* a, int* __restrict b, int n) { for (int i = 0; i != 8; i++) b[i] = x264_clip_uint8 (a[i]); } gcc -O2 -march=x86-64-v3 -S foo(int*, int*, int): .. mov eax, 255 vpxor xmm0, xmm0, xmm0 vmovd xmm1, eax vpbroadcastd ymm1, xmm1 vmovdqu ymm2, YMMWORD PTR [rdi] vpminud ymm3, ymm2, ymm1 vpsubd ymm0, ymm0, ymm2 vmovdqa YMMWORD PTR [rsp-32], ymm3 vpsrad ymm0, ymm0, 31 vpcmpeqd ymm3, ymm2, YMMWORD PTR [rsp-32] vpblendvb ymm0, ymm0, ymm2, ymm3 vpand ymm1, ymm1, ymm0 vmovdqu YMMWORD PTR [rsi], ymm1 It can be better with mov eax, 255 vmovd xmm1, eax vpxor xmm0, xmm0, xmm0. vpbroadcastd ymm1, xmm1 vmovdqu ymm2, YMMWORD PTR [rdi] vpminud ymm3, ymm2, ymm1 vmovdqa YMMWORD PTR [rsp-32], ymm3 vcmpgtps ymm0, ymm2, ymm0 vpcmpeqd ymm3, ymm2, YMMWORD PTR [rsp-32] vpblendvb ymm0, ymm0, ymm2, ymm3 vpand ymm1, ymm1, ymm0 vmovdqu YMMWORD PTR [rsi], ymm1