> But I think this shows we mid-selected the optab, a convert_move is certainly > not correct unconditionally here (the target might not support that)
Make sense, we can wait a while for the confirmation from Richard S. If convert_move is not designed for Vector (looks like mostly up to a point), I am not sure if we can fix the assertion like below ... else If (VECTOR_INTERGER_TYPE (TREE_TYPE(lhs))) return; else { gcc_checking_assert (INTEGRAL_TYPE_TYPE_P (TREE_TYPE (lhs))); convert_move (lhs_rtx, ops[0].value, 0); } Aka bypass the vector here, but I am afraid this change may make the llrintf (SF => DI) not working on standard name. Let me have a try and keep you posted. Pan -----Original Message----- From: Richard Biener <richard.guent...@gmail.com> Sent: Thursday, October 26, 2023 10:00 PM To: Li, Pan2 <pan2...@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang <yanzhang.w...@intel.com>; kito.ch...@gmail.com; Liu, Hongtao <hongtao....@intel.com>; Richard Sandiford <richard.sandif...@arm.com> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer > Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2...@intel.com>: > > Thanks Richard for comments. > >> Can you explain why this is necessary? In particular what is lhs_rtx >> mode vs ops[0].value mode? > > For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below. > > The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]). > The ops[0].value is (reg:VNx2DI 104). > > The restriction removing make the vector rtl enter expand_fn_using_insn and > of course hit the INTEGER_P assertion. But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that) > Pan > > -----Original Message----- > From: Richard Biener <richard.guent...@gmail.com> > Sent: Thursday, October 26, 2023 4:38 PM > To: Li, Pan2 <pan2...@intel.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang > <yanzhang.w...@intel.com>; kito.ch...@gmail.com; Liu, Hongtao > <hongtao....@intel.com>; Richard Sandiford <richard.sandif...@arm.com> > Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer > >> On Thu, Oct 26, 2023 at 4:18 AM <pan2...@intel.com> wrote: >> >> From: Pan Li <pan2...@intel.com> >> >> Update in v2: >> >> * Fix one ICE of type assertion. >> * Adjust some test cases for aarch64 sve and riscv vector. >> >> Original log: >> >> The vectoriable_call has one restriction of the size of data type. >> Aka DF to DI is allowed but SF to DI isn't. You may see below message >> when try to vectorize function call like lrintf. >> >> void >> test_lrintf (long *out, float *in, unsigned count) >> { >> for (unsigned i = 0; i < count; i++) >> out[i] = __builtin_lrintf (in[i]); >> } >> >> lrintf.c:5:26: missed: couldn't vectorize loop >> lrintf.c:5:26: missed: not vectorized: unsupported data-type >> >> Then the standard name pattern like lrintmn2 cannot work for different >> data type size like SF => DI. This patch would like to remove this data >> type size check and unblock the standard name like lrintmn2. >> >> The below test are passed for this patch. >> >> * The x86 bootstrap and regression test. >> * The aarch64 regression test. >> * The risc-v regression tests. >> >> gcc/ChangeLog: >> >> * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. >> * tree-vect-stmts.cc (vectorizable_call): Remove size check. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. >> * gcc.target/aarch64/sve/clz_1.c: Ditto. >> * gcc.target/aarch64/sve/popcount_1.c: Ditto. >> * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. >> >> Signed-off-by: Pan Li <pan2...@intel.com> >> --- >> gcc/internal-fn.cc | 3 ++- >> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- >> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- >> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- >> .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- >> gcc/tree-vect-stmts.cc | 13 ------------- >> 6 files changed, 6 insertions(+), 21 deletions(-) >> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc >> index 61d5a9e4772..17c0f4c3805 100644 >> --- a/gcc/internal-fn.cc >> +++ b/gcc/internal-fn.cc >> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, >> unsigned int noutputs, >> emit_move_insn (lhs_rtx, ops[0].value); >> else >> { >> - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); >> + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) >> + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); > > Can you explain why this is necessary? In particular what is lhs_rtx > mode vs ops[0].value mode? > >> convert_move (lhs_rtx, ops[0].value, 0); > > I'm not sure convert_move handles vector modes correctly. Richard > probably added this code, CCed. > > Richard. > >> } >> } >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> index bdc9856faaf..940d08bbc7b 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict >> src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, >> z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, >> z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, >> z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, >> z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> index 0c7a4e6d768..58b8ff406d2 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict >> src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, >> z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, >> z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, >> z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, >> z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> index dfb6f4ac7a5..0eba898307c 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t >> *restrict src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, >> z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, >> z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, >> z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, >> z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> index 585a522aa81..e6e3c70f927 100644 >> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> @@ -1461,4 +1461,4 @@ main () >> RUN_ALL () >> } >> >> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index a9200767f67..fa4ca0634e8 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, >> >> return false; >> } >> - /* FORNOW: we don't yet support mixtures of vector sizes for calls, >> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* >> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed >> - by a pack of the two vectors into an SI vector. We would need >> - separate code to handle direct VnDI->VnSI IFN_CTZs. */ >> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) >> - { >> - if (dump_enabled_p ()) >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> - "mismatched vector sizes %T and %T\n", >> - vectype_in, vectype_out); >> - return false; >> - } >> >> if (VECTOR_BOOLEAN_TYPE_P (vectype_out) >> != VECTOR_BOOLEAN_TYPE_P (vectype_in)) >> -- >> 2.34.1 >>