On Fri, Oct 27, 2023 at 4:17 AM Li, Pan2 <pan2...@intel.com> wrote: > > Thanks Richard S for comments. > > > In other words, I don't think simply removing the test from the vectoriser > > is correct. It needs to be replaced by something more selective. > > Does it mean we need to check if the internal fun allow different modes/sizes > here? > > For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 > (only m mode) isn't.
We need to check whether the "size" of the LHS is somehow participating in the optab query. I think the direct_internal_fn_info type0/1 members, when -1 say that. If none is -1 (and -2) then the LHS has to match one of the arguments (if there are two different I'm not sure which we'd pick). So patch-wise the existing check can probably be skipped when vectorizable_internal_function returns an IFN but that function should have the very same check when vectype_out isn't participating in the optab selection. Richard. > > Pan > > -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: Friday, October 27, 2023 1:47 AM > To: Richard Biener <richard.guent...@gmail.com> > Cc: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org; > juzhe.zh...@rivai.ai; Wang, Yanzhang <yanzhang.w...@intel.com>; > kito.ch...@gmail.com; Liu, Hongtao <hongtao....@intel.com> > Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer > > Richard Biener <richard.guent...@gmail.com> writes: > >> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2...@intel.com>: > >> > >> Thanks Richard for comments. > >> > >>> Can you explain why this is necessary? In particular what is lhs_rtx > >>> mode vs ops[0].value mode? > >> > >> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as > >> below. > >> > >> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]). > >> The ops[0].value is (reg:VNx2DI 104). > >> > >> The restriction removing make the vector rtl enter expand_fn_using_insn > >> and of course hit the INTEGER_P assertion. > > > > But I think this shows we mid-selected the optab, a convert_move is > > certainly not correct unconditionally here (the target might not support > > that) > > Agreed. Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out) > makes sense if the called function allows the input and output modes > to vary. That's true for internal functions that eventually map to > two-mode optabs. But we can't remove the condition for calls to > other functions, at least not without some fix-ups. > > ISTM that the problem being hit is the one described by the removed > comment. > > In other words, I don't think simply removing the test from the vectoriser > is correct. It needs to be replaced by something more selective. > > Thanks, > Richard > > >> Pan > >> > >> -----Original Message----- > >> From: Richard Biener <richard.guent...@gmail.com> > >> Sent: Thursday, October 26, 2023 4:38 PM > >> To: Li, Pan2 <pan2...@intel.com> > >> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang > >> <yanzhang.w...@intel.com>; kito.ch...@gmail.com; Liu, Hongtao > >> <hongtao....@intel.com>; Richard Sandiford <richard.sandif...@arm.com> > >> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of > >> vectorizer > >> > >>> On Thu, Oct 26, 2023 at 4:18 AM <pan2...@intel.com> wrote: > >>> > >>> From: Pan Li <pan2...@intel.com> > >>> > >>> Update in v2: > >>> > >>> * Fix one ICE of type assertion. > >>> * Adjust some test cases for aarch64 sve and riscv vector. > >>> > >>> Original log: > >>> > >>> The vectoriable_call has one restriction of the size of data type. > >>> Aka DF to DI is allowed but SF to DI isn't. You may see below message > >>> when try to vectorize function call like lrintf. > >>> > >>> void > >>> test_lrintf (long *out, float *in, unsigned count) > >>> { > >>> for (unsigned i = 0; i < count; i++) > >>> out[i] = __builtin_lrintf (in[i]); > >>> } > >>> > >>> lrintf.c:5:26: missed: couldn't vectorize loop > >>> lrintf.c:5:26: missed: not vectorized: unsupported data-type > >>> > >>> Then the standard name pattern like lrintmn2 cannot work for different > >>> data type size like SF => DI. This patch would like to remove this data > >>> type size check and unblock the standard name like lrintmn2. > >>> > >>> The below test are passed for this patch. > >>> > >>> * The x86 bootstrap and regression test. > >>> * The aarch64 regression test. > >>> * The risc-v regression tests. > >>> > >>> gcc/ChangeLog: > >>> > >>> * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. > >>> * tree-vect-stmts.cc (vectorizable_call): Remove size check. > >>> > >>> gcc/testsuite/ChangeLog: > >>> > >>> * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. > >>> * gcc.target/aarch64/sve/clz_1.c: Ditto. > >>> * gcc.target/aarch64/sve/popcount_1.c: Ditto. > >>> * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. > >>> > >>> Signed-off-by: Pan Li <pan2...@intel.com> > >>> --- > >>> gcc/internal-fn.cc | 3 ++- > >>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- > >>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- > >>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- > >>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- > >>> gcc/tree-vect-stmts.cc | 13 ------------- > >>> 6 files changed, 6 insertions(+), 21 deletions(-) > >>> > >>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > >>> index 61d5a9e4772..17c0f4c3805 100644 > >>> --- a/gcc/internal-fn.cc > >>> +++ b/gcc/internal-fn.cc > >>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, > >>> unsigned int noutputs, > >>> emit_move_insn (lhs_rtx, ops[0].value); > >>> else > >>> { > >>> - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); > >>> + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > >>> + || VECTOR_INTEGER_TYPE_P (TREE_TYPE > >>> (lhs))); > >> > >> Can you explain why this is necessary? In particular what is lhs_rtx > >> mode vs ops[0].value mode? > >> > >>> convert_move (lhs_rtx, ops[0].value, 0); > >> > >> I'm not sure convert_move handles vector modes correctly. Richard > >> probably added this code, CCed. > >> > >> Richard. > >> > >>> } > >>> } > >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > >>> b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > >>> index bdc9856faaf..940d08bbc7b 100644 > >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > >>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t > >>> *restrict src, int size) > >>> } > >>> > >>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, > >>> z[0-9]+\.s\n} 1 } } */ > >>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, > >>> z[0-9]+\.d\n} 2 } } */ > >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, > >>> z[0-9]+\.s\n} 1 } } */ > >>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, > >>> z[0-9]+\.d\n} 1 } } */ > >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > >>> b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > >>> index 0c7a4e6d768..58b8ff406d2 100644 > >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > >>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict > >>> src, int size) > >>> } > >>> > >>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, > >>> z[0-9]+\.s\n} 1 } } */ > >>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, > >>> z[0-9]+\.d\n} 2 } } */ > >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, > >>> z[0-9]+\.s\n} 1 } } */ > >>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, > >>> z[0-9]+\.d\n} 1 } } */ > >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > >>> b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > >>> index dfb6f4ac7a5..0eba898307c 100644 > >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > >>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t > >>> *restrict src, int size) > >>> } > >>> > >>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, > >>> z[0-9]+\.s\n} 1 } } */ > >>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, > >>> z[0-9]+\.d\n} 2 } } */ > >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, > >>> z[0-9]+\.s\n} 1 } } */ > >>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, > >>> z[0-9]+\.d\n} 1 } } */ > >>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > >>> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > >>> index 585a522aa81..e6e3c70f927 100644 > >>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > >>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > >>> @@ -1461,4 +1461,4 @@ main () > >>> RUN_ALL () > >>> } > >>> > >>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ > >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ > >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > >>> index a9200767f67..fa4ca0634e8 100644 > >>> --- a/gcc/tree-vect-stmts.cc > >>> +++ b/gcc/tree-vect-stmts.cc > >>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, > >>> > >>> return false; > >>> } > >>> - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > >>> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > >>> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > >>> - by a pack of the two vectors into an SI vector. We would need > >>> - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > >>> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > >>> - { > >>> - if (dump_enabled_p ()) > >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >>> - "mismatched vector sizes %T and %T\n", > >>> - vectype_in, vectype_out); > >>> - return false; > >>> - } > >>> > >>> if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > >>> != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > >>> -- > >>> 2.34.1 > >>>