On Wed, Jun 21, 2023 at 2:29 PM Jan Beulich via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Following two-operand bitwise operations, add another splitter to also > deal with not followed by broadcast all on its own, which can be > expressed as simple embedded broadcast instead once a broadcast operand > is actually permitted in the respective insn. While there also permit > a broadcast operand in the corresponding expander. The patch LGTM. > > gcc/ > > * config/i386/sse.md: New splitters to simplify > not;vec_duplicate as a singular vpternlog. > (one_cmpl<mode>2): Allow broadcast for operand 1. > (<mask_codefor>one_cmpl<mode>2<mask_name>): Likewise. > > gcc/testsuite/ > > * gcc.target/i386/pr100711-6.c: New test. > --- > For the purpose here (and elsewhere) bcst_vector_operand() (really: > bcst_mem_operand()) isn't permissive enough: We'd want it to allow > 128-bit and 256-bit types as well irrespective of AVX512VL being > enabled. This would likely require a new predicate > (bcst_intvec_operand()?) and a new constraint (BR? Bi?). (Yet for name > selection it will want considering that this is applicable to certain > non-calculational FP operations as well.) I think so. > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17156,7 +17156,7 @@ > > (define_expand "one_cmpl<mode>2" > [(set (match_operand:VI 0 "register_operand") > - (xor:VI (match_operand:VI 1 "vector_operand") > + (xor:VI (match_operand:VI 1 "bcst_vector_operand") > (match_dup 2)))] > "TARGET_SSE" > { > @@ -17168,7 +17168,7 @@ > > (define_insn "<mask_codefor>one_cmpl<mode>2<mask_name>" > [(set (match_operand:VI 0 "register_operand" "=v,v") > - (xor:VI (match_operand:VI 1 "nonimmediate_operand" "v,m") > + (xor:VI (match_operand:VI 1 "bcst_vector_operand" "vBr,m") > (match_operand:VI 2 "vector_all_ones_operand" "BC,BC")))] > "TARGET_AVX512F > && (!<mask_applied> > @@ -17191,6 +17191,19 @@ > (symbol_ref "<MODE_SIZE> == 64 || TARGET_AVX512VL") > (const_int 1)))]) > > +(define_split > + [(set (match_operand:VI48_AVX512F 0 "register_operand") > + (vec_duplicate:VI48_AVX512F > + (not:<ssescalarmode> > + (match_operand:<ssescalarmode> 1 "nonimmediate_operand"))))] > + "<MODE_SIZE> == 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" > + [(set (match_dup 0) > + (xor:VI48_AVX512F > + (vec_duplicate:VI48_AVX512F (match_dup 1)) > + (match_dup 2)))] > + "operands[2] = CONSTM1_RTX (<MODE>mode);") > + > (define_expand "<sse2_avx2>_andnot<mode>3" > [(set (match_operand:VI_AVX2 0 "register_operand") > (and:VI_AVX2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr100711-6.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=512 -O2" } */ > + > +typedef int v16si __attribute__ ((vector_size (64))); > +typedef long long v8di __attribute__((vector_size (64))); > + > +v16si foo_v16si (const int *a) > +{ > + return (__extension__ (v16si) {~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, > + ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a}); > +} > + > +v8di foo_v8di (const long long *a) > +{ > + return (__extension__ (v8di) {~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a}); > +} > + > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0x55, > \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}" 2 } } */ >
-- BR, Hongtao