On Tue, May 19, 2026 at 8:22 AM Roger Sayle <[email protected]> wrote: > > > This patch, inspired by PR target/90483 and libstdc++/118416, implements > some RTL expansion-time simplifications of ptest. A common idiom for > testing a vector against zero is to use ptestz(mask,-1). Alas the code > generated for this is suboptimal, requiring materialization of an all_ones > vector. Given that ptestz(x,y) is defined as (x & y) != 0, an equivalent > form is ptestz(mask,mask), saving an instruction (if ~0 isn't available). > > Consider the function: > > typedef long long v2di __attribute__ ((__vector_size__ (16))); > > int foo (v2di x) > { > return __builtin_ia32_ptestz128(x,~(v2di){0,0}); > } > > with -O2 -mavx2, GCC currently generates: > > foo: vpcmpeqd %xmm1, %xmm1, %xmm1 > xorl %eax, %eax > vptest %xmm1, %xmm0 > sete %al > ret > > with this patch, it now generates: > > foo: xorl %eax, %eax > vptest %xmm0, %xmm0 > sete %al > ret > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2026-05-18 Roger Sayle <[email protected]> > > gcc/ChangeLog > PR target/90483 > PR libstdc++/118416 > * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Refactor > with optimizations for PTESTZ*, PTESTC* and PTESTNZC*, including > transforming ptestz(x,-1) into ptestz(x,x). > > gcc/testsuite/ChangeLog > PR target/90483 > PR libstdc++/118416 > * gcc.target/config/i386/sse4_1-ptest-8.c: New test case. > * gcc.target/config/i386/sse4_1-ptest-9.c: Likewise.
Should be gcc.target/i386/sse4_1-ptest-9.c, not config here. The patch LGTM. > > -- BR, Hongtao
