Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

钟居哲 Thu, 02 Nov 2023 07:36:03 -0700

Hi, Richi.

>> Do we really need to have two modes for the optab though or could we
>> simply require the target to support arbitrary offset modes (give it
>> is implicitly constrained to ptr_mode for the base already)?  Or
>> properly extend/truncate the offset at expansion time, say to ptr_mode
>> or to the mode of sizetype.


For RVV, it's ok by default set stride type as ptr_mode/size_type by default.
Is it ok that I define strided load/store as single mode optab and default 
Pmode as stride operand?
How about scale and signed/unsigned operand ?
It seems scale operand can be removed ? Since we can pass DR_STEP directly to 
the stride arguments.
But I think we can't remove signed/unsigned operand since for strided mode = SI 
mode, the unsigned
maximum stride = 2^31 wheras signed is 2 ^ 30.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-02 21:52
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw; richard.sandiford; rdapp.gcc
Subject: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Tue, 31 Oct 2023, Juzhe-Zhong wrote:
 
> As previous Richard's suggested, we should support strided load/store in
> loop vectorizer instead hacking RISC-V backend.
> 
> This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN.
> 
> The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with
> changing vector offset into scalar stride.
 
I see that it follows gather/scatter.  I'll note that when introducing
those we failed to add a specifier for TBAA and alignment info for the
data access.  That means we have to use alias-set zero for the accesses
(I see existing targets use UNSPECs with some not elaborate MEM anyway,
but TBAA info might have been the "easy" and obvious property to 
preserve).  For alignment we either have to assume unaligned or reject
vectorization of accesses that do not have their original scalar accesses
naturally aligned (aligned according to their mode).  We don't seem
to check that though.
 
It might be fine to go forward with this since gather/scatter are broken
in a similar way.
 
Do we really need to have two modes for the optab though or could we
simply require the target to support arbitrary offset modes (give it
is implicitly constrained to ptr_mode for the base already)?  Or
properly extend/truncate the offset at expansion time, say to ptr_mode
or to the mode of sizetype.
 
Thanks,
Richard.
> We don't have strided_load/strided_store and 
> mask_strided_load/mask_strided_store since
> it't unlikely RVV will have such optabs and we can't add the patterns that we 
> can't test them.
>
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store.
> * internal-fn.cc (internal_load_fn_p): Ditto.
> (internal_strided_fn_p): Ditto.
> (internal_fn_len_index): Ditto.
> (internal_fn_mask_index): Ditto.
> (internal_fn_stored_value_index): Ditto.
> (internal_strided_fn_supported_p): Ditto.
> * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto.
> (MASK_LEN_STRIDED_STORE): Ditto.
> * internal-fn.h (internal_strided_fn_p): Ditto.
> (internal_strided_fn_supported_p): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi     | 51 +++++++++++++++++++++++++++++++++++++++++++++
>  gcc/internal-fn.cc  | 44 ++++++++++++++++++++++++++++++++++++++
>  gcc/internal-fn.def |  4 ++++
>  gcc/internal-fn.h   |  2 ++
>  gcc/optabs.def      |  2 ++
>  5 files changed, 103 insertions(+)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..5bac713a0dd 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>  
> +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}@var{n}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of mode 
> @var{n}.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i:
> +
> +@itemize @bullet
> +@item
> +extend the stride to address width, using zero
> +extension if operand 3 is 1 and sign extension if operand 3 is zero;
> +@item
> +multiply the extended stride by operand 4;
> +@item
> +add the result to the base; and
> +@item
> +load the value at that address (operand 1 + @var{i} * multiplied and 
> extended stride) into element @var{i} of operand 0.
> +@end itemize
> +
> +Similar to mask_len_load, the instruction loads at most (operand 6 + operand 
> 7) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5131,6 +5157,31 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
>  
> +@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}@var{n}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of mode 
> @var{n}.
> +Operand 2 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i:
> +
> +@itemize @bullet
> +@item
> +extend the stride to address width, using zero
> +extension if operand 2 is 1 and sign extension if operand 2 is zero;
> +@item
> +multiply the extended stride by operand 3;
> +@item
> +add the result to the base; and
> +@item
> +store element @var{i} of operand 4 to that address (operand 1 + @var{i} * 
> multiplied and extended stride).
> +@end itemize
> +
> +Similar to mask_len_store, the instruction stores at most (operand 6 + 
> operand 7) elements of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
> +Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index e7451b96353..f7f85aa7dde 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4596,6 +4596,7 @@ internal_load_fn_p (internal_fn fn)
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
>      case IFN_MASK_LEN_GATHER_LOAD:
> +    case IFN_MASK_LEN_STRIDED_LOAD:
>      case IFN_LEN_LOAD:
>      case IFN_MASK_LEN_LOAD:
>        return true;
> @@ -4648,6 +4649,22 @@ internal_gather_scatter_fn_p (internal_fn fn)
>      }
>  }
>  
> +/* Return true if IFN is some form of strided load or strided store.  */
> +
> +bool
> +internal_strided_fn_p (internal_fn fn)
> +{
> +  switch (fn)
> +    {
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
> +      return true;
> +
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* If FN takes a vector len argument, return the index of that argument,
>     otherwise return -1.  */
>  
> @@ -4662,6 +4679,8 @@ internal_fn_len_index (internal_fn fn)
>  
>      case IFN_MASK_LEN_GATHER_LOAD:
>      case IFN_MASK_LEN_SCATTER_STORE:
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
>      case IFN_COND_LEN_FMA:
>      case IFN_COND_LEN_FMS:
>      case IFN_COND_LEN_FNMA:
> @@ -4719,6 +4738,8 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_MASK_SCATTER_STORE:
>      case IFN_MASK_LEN_GATHER_LOAD:
>      case IFN_MASK_LEN_SCATTER_STORE:
> +    case IFN_MASK_LEN_STRIDED_LOAD:
> +    case IFN_MASK_LEN_STRIDED_STORE:
>        return 4;
>  
>      default:
> @@ -4740,6 +4761,7 @@ internal_fn_stored_value_index (internal_fn fn)
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
>      case IFN_MASK_LEN_SCATTER_STORE:
> +    case IFN_MASK_LEN_STRIDED_STORE:
>        return 3;
>  
>      case IFN_LEN_STORE:
> @@ -4784,6 +4806,28 @@ internal_gather_scatter_fn_supported_p (internal_fn 
> ifn, tree vector_type,
>    && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
>  }
>  
> +/* Return true if the target supports strided load or strided store function
> +   IFN.  For loads, VECTOR_TYPE is the vector type of the load result,
> +   while for stores it is the vector type of the stored data argument.
> +   STRIDE_TYPE is the type that holds the stride from the previous element
> +   memory address of each loaded or stored element.
> +   SCALE is the amount by which these stride should be multiplied
> +   *after* they have been extended to address width.  */
> +
> +bool
> +internal_strided_fn_supported_p (internal_fn ifn, tree vector_type,
> + tree offset_type, int scale)
> +{
> +  optab optab = direct_internal_fn_optab (ifn);
> +  insn_code icode = convert_optab_handler (optab, TYPE_MODE (vector_type),
> +    TYPE_MODE (offset_type));
> +  int output_ops = internal_load_fn_p (ifn) ? 1 : 0;
> +  bool unsigned_p = TYPE_UNSIGNED (offset_type);
> +  return (icode != CODE_FOR_nothing
> +   && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p))
> +   && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
> +}
> +
>  /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN
>     for pointers of type TYPE when the accesses have LENGTH bytes and their
>     common byte alignment is ALIGN.  */
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index a2023ab9c3d..0fa532e8f6b 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -199,6 +199,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>         mask_gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE,
>         mask_len_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE,
> +        mask_len_strided_load, gather_load)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load)
> @@ -208,6 +210,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>         mask_scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0,
>         mask_len_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0,
> +        mask_len_strided_store, scatter_store)
>  
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index 99de13a0199..8379c61dff7 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -235,11 +235,13 @@ extern bool can_interpret_as_conditional_op_p (gimple 
> *, tree *,
>  extern bool internal_load_fn_p (internal_fn);
>  extern bool internal_store_fn_p (internal_fn);
>  extern bool internal_gather_scatter_fn_p (internal_fn);
> +extern bool internal_strided_fn_p (internal_fn);
>  extern int internal_fn_mask_index (internal_fn);
>  extern int internal_fn_len_index (internal_fn);
>  extern int internal_fn_stored_value_index (internal_fn);
>  extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
>      tree, tree, int);
> +extern bool internal_strided_fn_supported_p (internal_fn, tree, tree, int);
>  extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree,
>  poly_uint64, unsigned int);
>  #define VECT_PARTIAL_BIAS_UNSUPPORTED 127
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 2ccbe4197b7..3d85ac5f678 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -98,9 +98,11 @@ OPTAB_CD(mask_len_store_optab, "mask_len_store$a$b")
>  OPTAB_CD(gather_load_optab, "gather_load$a$b")
>  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
>  OPTAB_CD(mask_len_gather_load_optab, "mask_len_gather_load$a$b")
> +OPTAB_CD(mask_len_strided_load_optab, "mask_len_strided_load$a$b")
>  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
>  OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
>  OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b")
> +OPTAB_CD(mask_len_strided_store_optab, "mask_len_strided_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")
>  
> 
 
-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

Reply via email to