On 14/04/2023 10:09, Richard Biener wrote:
On Fri, Apr 14, 2023 at 10:43 AM Andre Vieira (lists)
<andre.simoesdiasvie...@arm.com> wrote:

Resending this to everyone (sorry for the double send Richard).

On 14/04/2023 09:15, Andre Vieira (lists) wrote:
  >
  >
  > On 14/04/2023 07:55, Richard Biener wrote:
  >> On Thu, Apr 13, 2023 at 4:25 PM Andre Vieira (lists)
  >> <andre.simoesdiasvie...@arm.com> wrote:
  >>>
  >>>
  >>>
  >>> On 13/04/2023 15:00, Richard Biener wrote:
  >>>> On Thu, Apr 13, 2023 at 3:00 PM Andre Vieira (lists) via Gcc-patches
  >>>> <gcc-patches@gcc.gnu.org> wrote:
  >>>>>
  >>>>>
  >>>>>
  >>>
  >>> But that's not it, I've been looking at it, and there is code in place
  >>> that does what I expected which is defer the choice of vectype for simd
  >>> clones until vectorizable_simd_clone_call, unfortunately it has a
  >>> mistaken assumption that simdclones don't return :/
  >>
  >> I think that's not it - when the SIMD clone returns a vector we have to
  >> determine the vector type in this function.  We cannot defer this.
  >
  > What's 'this function' here, do you mean we have to determine the
  > vectype in 'vect_get_vector_types_for_stmt' &
  > 'vect_determine_vf_for_stmt' ?

Yes.

Because at that time we don't yet know
  > what clone we will be using, this choice is done inside
  > vectorizable_simd_clone_call. In fact, to choose the simd clone, we need
  > to know the vf as that has to be a multiple of the chosen clone's
  > simdlen. So we simply can't use the simdclone's types (as that depends
  > on the simdlen) to choose the vf because the choice of simdlend depends
  > on the vf. And there was already code in place to handle this,
  > unfortunately that code was wrong and had the wrong assumption that
  > simdclones didn't return (probably was true at some point and bitrotted).

But to compute the VF we need to know the vector types!  We're only
calling vectorizable_* when the VF is final.  That said, the code you quote:

  >>
  >>> see vect_get_vector_types_for_stmt:
  >>> ...
  >>>     if (gimple_get_lhs (stmt) == NULL_TREE

is just for the case of a function without return value.  For this case
it's OK to do nothing - 'vectype' is the vector type of all vector defs
a stmt produces.

For calls with a LHS it should fall through to generic code doing
get_vectype_for_scalar_type on the LHS type.

I think that may work, but right now it will still go and look at the arguments of the call and use the smallest type among them to adjust the nunits (in 'vect_get_vector_types_for_stmt').

In fact (this is just for illustration) if I hack that function like this:
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12745,8 +12745,11 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
       /* The number of units is set according to the smallest scalar
         type (or the largest vector size, but we only support one
         vector size per vectorization).  */
-      scalar_type = vect_get_smallest_scalar_type (stmt_info,
-                                                  TREE_TYPE (vectype));
+      if (simd_clone_call_p (stmt_info->stmt))
+       scalar_type = TREE_TYPE (vectype);
+      else
+       scalar_type = vect_get_smallest_scalar_type (stmt_info,
+                                                    TREE_TYPE (vectype));
       if (scalar_type != TREE_TYPE (vectype))
        {
          if (dump_enabled_p ())

It will use the same types as before without (-m32), like I explained before the -m32 turns the pointer inside MASK_CALL into a 32-bit pointer so now the smallest size is 32-bits. This makes it pick V8SI instead of the original V4DI (scalar return type is DImode). Changing the VF to 8, thus unrolling the loop as it needs to make 2 calls, each handling 4 nunits.



Reply via email to