On Fri, Oct 27, 2023 at 4:17 AM Li, Pan2 <pan2...@intel.com> wrote:
>
> Thanks Richard S for comments.
>
> > In other words, I don't think simply removing the test from the vectoriser
> > is correct.  It needs to be replaced by something more selective.
>
> Does it mean we need to check if the internal fun allow different modes/sizes 
> here?
>
> For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 
> (only m mode) isn't.

We need to check whether the "size" of the LHS is somehow participating in the
optab query.  I think the direct_internal_fn_info type0/1 members,
when -1 say that.
If none is -1 (and -2) then the LHS has to match one of the arguments (if there
are two different I'm not sure which we'd pick).

So patch-wise the existing check can probably be skipped when
vectorizable_internal_function
returns an IFN but that function should have the very same check when
vectype_out isn't
participating in the optab selection.

Richard.

>
> Pan
>
> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Friday, October 27, 2023 1:47 AM
> To: Richard Biener <richard.guent...@gmail.com>
> Cc: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; Wang, Yanzhang <yanzhang.w...@intel.com>; 
> kito.ch...@gmail.com; Liu, Hongtao <hongtao....@intel.com>
> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
>
> Richard Biener <richard.guent...@gmail.com> writes:
> >> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2...@intel.com>:
> >>
> >> Thanks Richard for comments.
> >>
> >>> Can you explain why this is necessary?  In particular what is lhs_rtx
> >>> mode vs ops[0].value mode?
> >>
> >> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as 
> >> below.
> >>
> >> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
> >> The ops[0].value is (reg:VNx2DI 104).
> >>
> >> The restriction removing make the vector rtl enter expand_fn_using_insn 
> >> and of course hit the INTEGER_P assertion.
> >
> > But I think this shows we mid-selected the optab, a convert_move is 
> > certainly not correct unconditionally here (the target might not support 
> > that)
>
> Agreed.  Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)
> makes sense if the called function allows the input and output modes
> to vary.  That's true for internal functions that eventually map to
> two-mode optabs.  But we can't remove the condition for calls to
> other functions, at least not without some fix-ups.
>
> ISTM that the problem being hit is the one described by the removed
> comment.
>
> In other words, I don't think simply removing the test from the vectoriser
> is correct.  It needs to be replaced by something more selective.
>
> Thanks,
> Richard
>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Richard Biener <richard.guent...@gmail.com>
> >> Sent: Thursday, October 26, 2023 4:38 PM
> >> To: Li, Pan2 <pan2...@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
> >> <yanzhang.w...@intel.com>; kito.ch...@gmail.com; Liu, Hongtao 
> >> <hongtao....@intel.com>; Richard Sandiford <richard.sandif...@arm.com>
> >> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of 
> >> vectorizer
> >>
> >>> On Thu, Oct 26, 2023 at 4:18 AM <pan2...@intel.com> wrote:
> >>>
> >>> From: Pan Li <pan2...@intel.com>
> >>>
> >>> Update in v2:
> >>>
> >>> * Fix one ICE of type assertion.
> >>> * Adjust some test cases for aarch64 sve and riscv vector.
> >>>
> >>> Original log:
> >>>
> >>> The vectoriable_call has one restriction of the size of data type.
> >>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> >>> when try to vectorize function call like lrintf.
> >>>
> >>> void
> >>> test_lrintf (long *out, float *in, unsigned count)
> >>> {
> >>>  for (unsigned i = 0; i < count; i++)
> >>>    out[i] = __builtin_lrintf (in[i]);
> >>> }
> >>>
> >>> lrintf.c:5:26: missed: couldn't vectorize loop
> >>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
> >>>
> >>> Then the standard name pattern like lrintmn2 cannot work for different
> >>> data type size like SF => DI. This patch would like to remove this data
> >>> type size check and unblock the standard name like lrintmn2.
> >>>
> >>> The below test are passed for this patch.
> >>>
> >>> * The x86 bootstrap and regression test.
> >>> * The aarch64 regression test.
> >>> * The risc-v regression tests.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
> >>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
> >>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
> >>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
> >>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
> >>>
> >>> Signed-off-by: Pan Li <pan2...@intel.com>
> >>> ---
> >>> gcc/internal-fn.cc                                  |  3 ++-
> >>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
> >>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
> >>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
> >>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
> >>> gcc/tree-vect-stmts.cc                              | 13 -------------
> >>> 6 files changed, 6 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >>> index 61d5a9e4772..17c0f4c3805 100644
> >>> --- a/gcc/internal-fn.cc
> >>> +++ b/gcc/internal-fn.cc
> >>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, 
> >>> unsigned int noutputs,
> >>>        emit_move_insn (lhs_rtx, ops[0].value);
> >>>       else
> >>>        {
> >>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
> >>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> >>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE 
> >>> (lhs)));
> >>
> >> Can you explain why this is necessary?  In particular what is lhs_rtx
> >> mode vs ops[0].value mode?
> >>
> >>>          convert_move (lhs_rtx, ops[0].value, 0);
> >>
> >> I'm not sure convert_move handles vector modes correctly.  Richard
> >> probably added this code, CCed.
> >>
> >> Richard.
> >>
> >>>        }
> >>>     }
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c 
> >>> b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> index bdc9856faaf..940d08bbc7b 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t 
> >>> *restrict src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, 
> >>> z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, 
> >>> z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, 
> >>> z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, 
> >>> z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c 
> >>> b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> index 0c7a4e6d768..58b8ff406d2 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict 
> >>> src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, 
> >>> z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, 
> >>> z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, 
> >>> z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, 
> >>> z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c 
> >>> b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> index dfb6f4ac7a5..0eba898307c 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t 
> >>> *restrict src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, 
> >>> z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, 
> >>> z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, 
> >>> z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, 
> >>> z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c 
> >>> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> index 585a522aa81..e6e3c70f927 100644
> >>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> @@ -1461,4 +1461,4 @@ main ()
> >>>   RUN_ALL ()
> >>> }
> >>>
> >>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
> >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>> index a9200767f67..fa4ca0634e8 100644
> >>> --- a/gcc/tree-vect-stmts.cc
> >>> +++ b/gcc/tree-vect-stmts.cc
> >>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
> >>>
> >>>       return false;
> >>>     }
> >>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> >>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> >>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> >>> -     by a pack of the two vectors into an SI vector.  We would need
> >>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> >>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> >>> -    {
> >>> -      if (dump_enabled_p ())
> >>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >>> -                        "mismatched vector sizes %T and %T\n",
> >>> -                        vectype_in, vectype_out);
> >>> -      return false;
> >>> -    }
> >>>
> >>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
> >>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> >>> --
> >>> 2.34.1
> >>>

Reply via email to