On Tue, May 19, 2026 at 8:22 AM Roger Sayle <[email protected]> wrote:
>
>
> This patch, inspired by PR target/90483 and libstdc++/118416, implements
> some RTL expansion-time simplifications of ptest. A common idiom for
> testing a vector against zero is to use ptestz(mask,-1).  Alas the code
> generated for this is suboptimal, requiring materialization of an all_ones
> vector.  Given that ptestz(x,y) is defined as (x & y) != 0, an equivalent
> form is ptestz(mask,mask), saving an instruction (if ~0 isn't available).
>
> Consider the function:
>
> typedef long long v2di __attribute__ ((__vector_size__ (16)));
>
> int foo (v2di x)
> {
>   return __builtin_ia32_ptestz128(x,~(v2di){0,0});
> }
>
> with -O2 -mavx2, GCC currently generates:
>
> foo:    vpcmpeqd        %xmm1, %xmm1, %xmm1
>         xorl    %eax, %eax
>         vptest  %xmm1, %xmm0
>         sete    %al
>         ret
>
> with this patch, it now generates:
>
> foo:    xorl    %eax, %eax
>         vptest  %xmm0, %xmm0
>         sete    %al
>         ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2026-05-18  Roger Sayle  <[email protected]>
>
> gcc/ChangeLog
>         PR target/90483
>         PR libstdc++/118416
>         * config/i386/i386-expand.cc (ix86_expand_sse_ptest):  Refactor
>         with optimizations for PTESTZ*, PTESTC* and PTESTNZC*, including
>         transforming ptestz(x,-1) into ptestz(x,x).
>
> gcc/testsuite/ChangeLog
>         PR target/90483
>         PR libstdc++/118416
>         * gcc.target/config/i386/sse4_1-ptest-8.c: New test case.
>         * gcc.target/config/i386/sse4_1-ptest-9.c: Likewise.

Should be gcc.target/i386/sse4_1-ptest-9.c, not config here.

The patch LGTM.

>
>


-- 
BR,
Hongtao

Reply via email to