On Wed, Nov 6, 2019 at 5:06 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > The gather and scatter optabs required the vector offset to be > the integer equivalent of the vector mode being loaded or stored. > This patch generalises them so that the two vectors can have different > element sizes, although they still need to have the same number of > elements. > > One consequence of this is that it's possible (if unlikely) > for two IFN_GATHER_LOADs to have the same arguments but different > return types. E.g. the same scalar base and vector of 32-bit offsets > could be used to load 8-bit elements and to load 16-bit elements. > From just looking at the arguments, we could wrongly deduce that > they're equivalent. > > I know we saw this happen at one point with IFN_WHILE_ULT, > and we dealt with it there by passing a zero of the return type > as an extra argument. Doing the same here also makes the load > and store functions have the same argument assignment. > > For now this patch should be a no-op, but later SVE patches take > advantage of the new flexibility. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?
OK. Thanks, Richard. > Richard > > > 2019-11-06 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > * optabs.def (gather_load_optab, mask_gather_load_optab) > (scatter_store_optab, mask_scatter_store_optab): Turn into > conversion optabs, with the offset mode given explicitly. > * doc/md.texi: Update accordingly. > * config/aarch64/aarch64-sve-builtins-base.cc > (svld1_gather_impl::expand): Likewise. > (svst1_scatter_impl::expand): Likewise. > * internal-fn.c (gather_load_direct, scatter_store_direct): Likewise. > (expand_scatter_store_optab_fn): Likewise. > (direct_gather_load_optab_supported_p): Likewise. > (direct_scatter_store_optab_supported_p): Likewise. > (expand_gather_load_optab_fn): Likewise. Expect the mask argument > to be argument 4. > (internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD. > (internal_gather_scatter_fn_supported_p): Replace the offset sign > argument with the offset vector type. Require the two vector > types to have the same number of elements but allow their element > sizes to be different. Treat the optabs as conversion optabs. > * internal-fn.h (internal_gather_scatter_fn_supported_p): Update > prototype accordingly. > * optabs-query.c (supports_at_least_one_mode_p): Replace with... > (supports_vec_convert_optab_p): ...this new function. > (supports_vec_gather_load_p): Update accordingly. > (supports_vec_scatter_store_p): Likewise. > * tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info. > Replace the offset sign and bits parameters with a scalar type tree. > * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise. > Pass back the offset vector type instead of the scalar element type. > Allow the offset to be wider than the memory elements. Search for > an offset type that the target supports, stopping once we've > reached the maximum of the element size and pointer size. > Update call to internal_gather_scatter_fn_supported_p. > (vect_check_gather_scatter): Update calls accordingly. > When testing a new scale before knowing the final offset type, > check whether the scale is supported for any signed or unsigned > offset type. Check whether the target supports the source and > target types of a conversion before deciding whether to look > through the conversion. Record the chosen offset_vectype. > * tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete. > (vect_recog_gather_scatter_pattern): Get the scalar offset type > directly from the gs_info's offset_vectype instead. Pass a zero > of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD. > * tree-vect-stmts.c (check_load_store_masking): Update call to > internal_gather_scatter_fn_supported_p, passing the offset vector > type recorded in the gs_info. > (vect_truncate_gather_scatter_offset): Update call to > vect_check_gather_scatter, leaving it to search for a valid > offset vector type. > (vect_use_strided_gather_scatters_p): Convert the offset to the > element type of the gs_info's offset_vectype. > (vect_get_gather_scatter_ops): Get the offset vector type directly > from the gs_info. > (vect_get_strided_load_store_ops): Likewise. > (vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD > and IFN_MASK_GATHER_LOAD. > * config/aarch64/aarch64-sve.md (gather_load<mode>): Rename to... > (gather_load<mode><v_int_equiv>): ...this. > (mask_gather_load<mode>): Rename to... > (mask_gather_load<mode><v_int_equiv>): ...this. > (scatter_store<mode>): Rename to... > (scatter_store<mode><v_int_equiv>): ...this. > (mask_scatter_store<mode>): Rename to... > (mask_scatter_store<mode><v_int_equiv>): ...this. > > Index: gcc/optabs.def > =================================================================== > --- gcc/optabs.def 2019-09-30 17:55:27.403766854 +0100 > +++ gcc/optabs.def 2019-11-06 16:03:37.368360019 +0000 > @@ -91,6 +91,10 @@ OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b") > OPTAB_CD(vec_cmpeq_optab, "vec_cmpeq$a$b") > OPTAB_CD(maskload_optab, "maskload$a$b") > OPTAB_CD(maskstore_optab, "maskstore$a$b") > +OPTAB_CD(gather_load_optab, "gather_load$a$b") > +OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b") > +OPTAB_CD(scatter_store_optab, "scatter_store$a$b") > +OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b") > OPTAB_CD(vec_extract_optab, "vec_extract$a$b") > OPTAB_CD(vec_init_optab, "vec_init$a$b") > > @@ -425,11 +429,6 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I > OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a") > OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a") > > -OPTAB_D (gather_load_optab, "gather_load$a") > -OPTAB_D (mask_gather_load_optab, "mask_gather_load$a") > -OPTAB_D (scatter_store_optab, "scatter_store$a") > -OPTAB_D (mask_scatter_store_optab, "mask_scatter_store$a") > - > OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) > OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) > OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a") > Index: gcc/doc/md.texi > =================================================================== > --- gcc/doc/md.texi 2019-11-06 12:29:15.562690117 +0000 > +++ gcc/doc/md.texi 2019-11-06 16:03:37.364360047 +0000 > @@ -4959,12 +4959,12 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n} > > This pattern is not allowed to @code{FAIL}. > > -@cindex @code{gather_load@var{m}} instruction pattern > -@item @samp{gather_load@var{m}} > +@cindex @code{gather_load@var{m}@var{n}} instruction pattern > +@item @samp{gather_load@var{m}@var{n}} > Load several separate memory locations into a vector of mode @var{m}. > -Operand 1 is a scalar base address and operand 2 is a vector of > -offsets from that base. Operand 0 is a destination vector with the > -same number of elements as the offset. For each element index @var{i}: > +Operand 1 is a scalar base address and operand 2 is a vector of mode @var{n} > +containing offsets from that base. Operand 0 is a destination vector with > +the same number of elements as @var{n}. For each element index @var{i}: > > @itemize @bullet > @item > @@ -4981,20 +4981,20 @@ load the value at that address into elem > The value of operand 3 does not matter if the offsets are already > address width. > > -@cindex @code{mask_gather_load@var{m}} instruction pattern > -@item @samp{mask_gather_load@var{m}} > -Like @samp{gather_load@var{m}}, but takes an extra mask operand as > +@cindex @code{mask_gather_load@var{m}@var{n}} instruction pattern > +@item @samp{mask_gather_load@var{m}@var{n}} > +Like @samp{gather_load@var{m}@var{n}}, but takes an extra mask operand as > operand 5. Bit @var{i} of the mask is set if element @var{i} > of the result should be loaded from memory and clear if element @var{i} > of the result should be set to zero. > > -@cindex @code{scatter_store@var{m}} instruction pattern > -@item @samp{scatter_store@var{m}} > +@cindex @code{scatter_store@var{m}@var{n}} instruction pattern > +@item @samp{scatter_store@var{m}@var{n}} > Store a vector of mode @var{m} into several distinct memory locations. > -Operand 0 is a scalar base address and operand 1 is a vector of offsets > -from that base. Operand 4 is the vector of values that should be stored, > -which has the same number of elements as the offset. For each element > -index @var{i}: > +Operand 0 is a scalar base address and operand 1 is a vector of mode > +@var{n} containing offsets from that base. Operand 4 is the vector of > +values that should be stored, which has the same number of elements as > +@var{n}. For each element index @var{i}: > > @itemize @bullet > @item > @@ -5011,9 +5011,9 @@ store element @var{i} of operand 4 to th > The value of operand 2 does not matter if the offsets are already > address width. > > -@cindex @code{mask_scatter_store@var{m}} instruction pattern > -@item @samp{mask_scatter_store@var{m}} > -Like @samp{scatter_store@var{m}}, but takes an extra mask operand as > +@cindex @code{mask_scatter_store@var{m}@var{n}} instruction pattern > +@item @samp{mask_scatter_store@var{m}@var{n}} > +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as > operand 5. Bit @var{i} of the mask is set if element @var{i} > of the result should be stored to memory. > > Index: gcc/config/aarch64/aarch64-sve-builtins-base.cc > =================================================================== > --- gcc/config/aarch64/aarch64-sve-builtins-base.cc 2019-10-29 > 08:59:18.407479604 +0000 > +++ gcc/config/aarch64/aarch64-sve-builtins-base.cc 2019-11-06 > 16:03:37.348360159 +0000 > @@ -1076,7 +1076,9 @@ public: > /* Put the predicate last, as required by mask_gather_load_optab. */ > e.rotate_inputs_left (0, 5); > machine_mode mem_mode = e.memory_vector_mode (); > - insn_code icode = direct_optab_handler (mask_gather_load_optab, > mem_mode); > + machine_mode int_mode = aarch64_sve_int_mode (mem_mode); > + insn_code icode = convert_optab_handler (mask_gather_load_optab, > + mem_mode, int_mode); > return e.use_exact_insn (icode); > } > }; > @@ -2043,8 +2045,10 @@ public: > e.prepare_gather_address_operands (1); > /* Put the predicate last, as required by mask_scatter_store_optab. */ > e.rotate_inputs_left (0, 6); > - insn_code icode = direct_optab_handler (mask_scatter_store_optab, > - e.memory_vector_mode ()); > + machine_mode mem_mode = e.memory_vector_mode (); > + machine_mode int_mode = aarch64_sve_int_mode (mem_mode); > + insn_code icode = convert_optab_handler (mask_scatter_store_optab, > + mem_mode, int_mode); > return e.use_exact_insn (icode); > } > }; > Index: gcc/internal-fn.c > =================================================================== > --- gcc/internal-fn.c 2019-09-12 10:59:55.139303681 +0100 > +++ gcc/internal-fn.c 2019-11-06 16:03:37.368360019 +0000 > @@ -103,11 +103,11 @@ #define not_direct { -2, -2, false } > #define mask_load_direct { -1, 2, false } > #define load_lanes_direct { -1, -1, false } > #define mask_load_lanes_direct { -1, -1, false } > -#define gather_load_direct { -1, -1, false } > +#define gather_load_direct { 3, 1, false } > #define mask_store_direct { 3, 2, false } > #define store_lanes_direct { 0, 0, false } > #define mask_store_lanes_direct { 0, 0, false } > -#define scatter_store_direct { 3, 3, false } > +#define scatter_store_direct { 3, 1, false } > #define unary_direct { 0, 0, true } > #define binary_direct { 0, 0, true } > #define ternary_direct { 0, 0, true } > @@ -2785,7 +2785,8 @@ expand_scatter_store_optab_fn (internal_ > create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE > (mask))); > } > > - insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE > (rhs))); > + insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE > (rhs)), > + TYPE_MODE (TREE_TYPE (offset))); > expand_insn (icode, i, ops); > } > > @@ -2813,11 +2814,12 @@ expand_gather_load_optab_fn (internal_fn > create_integer_operand (&ops[i++], scale_int); > if (optab == mask_gather_load_optab) > { > - tree mask = gimple_call_arg (stmt, 3); > + tree mask = gimple_call_arg (stmt, 4); > rtx mask_rtx = expand_normal (mask); > create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE > (mask))); > } > - insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE > (lhs))); > + insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE > (lhs)), > + TYPE_MODE (TREE_TYPE (offset))); > expand_insn (icode, i, ops); > } > > @@ -3084,11 +3086,11 @@ #define direct_cond_ternary_optab_suppor > #define direct_mask_load_optab_supported_p direct_optab_supported_p > #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p > #define direct_mask_load_lanes_optab_supported_p > multi_vector_optab_supported_p > -#define direct_gather_load_optab_supported_p direct_optab_supported_p > +#define direct_gather_load_optab_supported_p convert_optab_supported_p > #define direct_mask_store_optab_supported_p direct_optab_supported_p > #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p > #define direct_mask_store_lanes_optab_supported_p > multi_vector_optab_supported_p > -#define direct_scatter_store_optab_supported_p direct_optab_supported_p > +#define direct_scatter_store_optab_supported_p convert_optab_supported_p > #define direct_while_optab_supported_p convert_optab_supported_p > #define direct_fold_extract_optab_supported_p direct_optab_supported_p > #define direct_fold_left_optab_supported_p direct_optab_supported_p > @@ -3513,8 +3515,6 @@ internal_fn_mask_index (internal_fn fn) > return 2; > > case IFN_MASK_GATHER_LOAD: > - return 3; > - > case IFN_MASK_SCATTER_STORE: > return 4; > > @@ -3546,27 +3546,30 @@ internal_fn_stored_value_index (internal > IFN. For loads, VECTOR_TYPE is the vector type of the load result, > while for stores it is the vector type of the stored data argument. > MEMORY_ELEMENT_TYPE is the type of the memory elements being loaded > - or stored. OFFSET_SIGN is the sign of the offset argument, which is > - only relevant when the offset is narrower than an address. SCALE is > - the amount by which the offset should be multiplied *after* it has > - been extended to address width. */ > + or stored. OFFSET_VECTOR_TYPE is the vector type that holds the > + offset from the shared base address of each loaded or stored element. > + SCALE is the amount by which these offsets should be multiplied > + *after* they have been extended to address width. */ > > bool > internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, > tree memory_element_type, > - signop offset_sign, int scale) > + tree offset_vector_type, int scale) > { > if (!tree_int_cst_equal (TYPE_SIZE (TREE_TYPE (vector_type)), > TYPE_SIZE (memory_element_type))) > return false; > + if (maybe_ne (TYPE_VECTOR_SUBPARTS (vector_type), > + TYPE_VECTOR_SUBPARTS (offset_vector_type))) > + return false; > optab optab = direct_internal_fn_optab (ifn); > - insn_code icode = direct_optab_handler (optab, TYPE_MODE (vector_type)); > + insn_code icode = convert_optab_handler (optab, TYPE_MODE (vector_type), > + TYPE_MODE (offset_vector_type)); > int output_ops = internal_load_fn_p (ifn) ? 1 : 0; > + bool unsigned_p = TYPE_UNSIGNED (TREE_TYPE (offset_vector_type)); > return (icode != CODE_FOR_nothing > - && insn_operand_matches (icode, 2 + output_ops, > - GEN_INT (offset_sign == UNSIGNED)) > - && insn_operand_matches (icode, 3 + output_ops, > - GEN_INT (scale))); > + && insn_operand_matches (icode, 2 + output_ops, GEN_INT > (unsigned_p)) > + && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); > } > > /* Expand STMT as though it were a call to internal function FN. */ > Index: gcc/internal-fn.h > =================================================================== > --- gcc/internal-fn.h 2019-03-08 18:14:26.725006353 +0000 > +++ gcc/internal-fn.h 2019-11-06 16:03:37.368360019 +0000 > @@ -220,7 +220,7 @@ extern bool internal_gather_scatter_fn_p > extern int internal_fn_mask_index (internal_fn); > extern int internal_fn_stored_value_index (internal_fn); > extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, > - tree, signop, int); > + tree, tree, int); > > extern void expand_internal_call (gcall *); > extern void expand_internal_call (internal_fn, gcall *); > Index: gcc/optabs-query.c > =================================================================== > --- gcc/optabs-query.c 2019-11-06 14:02:26.000000000 +0000 > +++ gcc/optabs-query.c 2019-11-06 16:03:37.368360019 +0000 > @@ -698,14 +698,18 @@ lshift_cheap_p (bool speed_p) > return cheap[speed_p]; > } > > -/* Return true if optab OP supports at least one mode. */ > +/* Return true if vector conversion optab OP supports at least one mode, > + given that the second mode is always an integer vector. */ > > static bool > -supports_at_least_one_mode_p (optab op) > +supports_vec_convert_optab_p (optab op) > { > for (int i = 0; i < NUM_MACHINE_MODES; ++i) > - if (direct_optab_handler (op, (machine_mode) i) != CODE_FOR_nothing) > - return true; > + if (VECTOR_MODE_P ((machine_mode) i)) > + for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j) > + if (convert_optab_handler (op, (machine_mode) i, > + (machine_mode) j) != CODE_FOR_nothing) > + return true; > > return false; > } > @@ -722,7 +726,7 @@ supports_vec_gather_load_p () > this_fn_optabs->supports_vec_gather_load_cached = true; > > this_fn_optabs->supports_vec_gather_load > - = supports_at_least_one_mode_p (gather_load_optab); > + = supports_vec_convert_optab_p (gather_load_optab); > > return this_fn_optabs->supports_vec_gather_load; > } > @@ -739,7 +743,7 @@ supports_vec_scatter_store_p () > this_fn_optabs->supports_vec_scatter_store_cached = true; > > this_fn_optabs->supports_vec_scatter_store > - = supports_at_least_one_mode_p (scatter_store_optab); > + = supports_vec_convert_optab_p (scatter_store_optab); > > return this_fn_optabs->supports_vec_scatter_store; > } > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2019-11-06 14:02:26.000000000 +0000 > +++ gcc/tree-vectorizer.h 2019-11-06 16:03:37.372359991 +0000 > @@ -1678,8 +1678,8 @@ extern opt_result vect_verify_datarefs_a > extern bool vect_slp_analyze_and_verify_instance_alignment (slp_instance); > extern opt_result vect_analyze_data_ref_accesses (vec_info *); > extern opt_result vect_prune_runtime_alias_test_list (loop_vec_info); > -extern bool vect_gather_scatter_fn_p (bool, bool, tree, tree, unsigned int, > - signop, int, internal_fn *, tree *); > +extern bool vect_gather_scatter_fn_p (vec_info *, bool, bool, tree, tree, > + tree, int, internal_fn *, tree *); > extern bool vect_check_gather_scatter (stmt_vec_info, loop_vec_info, > gather_scatter_info *); > extern opt_result vect_find_stmt_data_reference (loop_p, gimple *, > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2019-11-06 12:28:22.000000000 +0000 > +++ gcc/tree-vect-data-refs.c 2019-11-06 16:03:37.368360019 +0000 > @@ -3660,28 +3660,22 @@ vect_prune_runtime_alias_test_list (loop > /* Check whether we can use an internal function for a gather load > or scatter store. READ_P is true for loads and false for stores. > MASKED_P is true if the load or store is conditional. MEMORY_TYPE is > - the type of the memory elements being loaded or stored. OFFSET_BITS > - is the number of bits in each scalar offset and OFFSET_SIGN is the > - sign of the offset. SCALE is the amount by which the offset should > + the type of the memory elements being loaded or stored. OFFSET_TYPE > + is the type of the offset that is being applied to the invariant > + base address. SCALE is the amount by which the offset should > be multiplied *after* it has been converted to address width. > > - Return true if the function is supported, storing the function > - id in *IFN_OUT and the type of a vector element in *ELEMENT_TYPE_OUT. */ > + Return true if the function is supported, storing the function id in > + *IFN_OUT and the vector type for the offset in *OFFSET_VECTYPE_OUT. */ > > bool > -vect_gather_scatter_fn_p (bool read_p, bool masked_p, tree vectype, > - tree memory_type, unsigned int offset_bits, > - signop offset_sign, int scale, > - internal_fn *ifn_out, tree *element_type_out) > +vect_gather_scatter_fn_p (vec_info *vinfo, bool read_p, bool masked_p, > + tree vectype, tree memory_type, tree offset_type, > + int scale, internal_fn *ifn_out, > + tree *offset_vectype_out) > { > unsigned int memory_bits = tree_to_uhwi (TYPE_SIZE (memory_type)); > unsigned int element_bits = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype))); > - if (offset_bits > element_bits) > - /* Internal functions require the offset to be the same width as > - the vector elements. We can extend narrower offsets, but it isn't > - safe to truncate wider offsets. */ > - return false; > - > if (element_bits != memory_bits) > /* For now the vector elements must be the same width as the > memory elements. */ > @@ -3694,14 +3688,28 @@ vect_gather_scatter_fn_p (bool read_p, b > else > ifn = masked_p ? IFN_MASK_SCATTER_STORE : IFN_SCATTER_STORE; > > - /* Test whether the target supports this combination. */ > - if (!internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type, > - offset_sign, scale)) > - return false; > + for (;;) > + { > + tree offset_vectype = get_vectype_for_scalar_type (vinfo, offset_type); > + if (!offset_vectype) > + return false; > > - *ifn_out = ifn; > - *element_type_out = TREE_TYPE (vectype); > - return true; > + /* Test whether the target supports this combination. */ > + if (internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type, > + offset_vectype, scale)) > + { > + *ifn_out = ifn; > + *offset_vectype_out = offset_vectype; > + return true; > + } > + > + if (TYPE_PRECISION (offset_type) >= POINTER_SIZE > + && TYPE_PRECISION (offset_type) >= element_bits) > + return false; > + > + offset_type = build_nonstandard_integer_type > + (TYPE_PRECISION (offset_type) * 2, TYPE_UNSIGNED (offset_type)); > + } > } > > /* STMT_INFO is a call to an internal gather load or scatter store function. > @@ -3744,7 +3752,7 @@ vect_check_gather_scatter (stmt_vec_info > machine_mode pmode; > int punsignedp, reversep, pvolatilep = 0; > internal_fn ifn; > - tree element_type; > + tree offset_vectype; > bool masked_p = false; > > /* See whether this is already a call to a gather/scatter internal > function. > @@ -3905,13 +3913,18 @@ vect_check_gather_scatter (stmt_vec_info > { > int new_scale = tree_to_shwi (op1); > /* Only treat this as a scaling operation if the target > - supports it. */ > + supports it for at least some offset type. */ > if (use_ifn_p > - && !vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p, > - vectype, memory_type, 1, > - TYPE_SIGN (TREE_TYPE (op0)), > + && !vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), > + masked_p, vectype, > memory_type, > + signed_char_type_node, > + new_scale, &ifn, > + &offset_vectype) > + && !vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), > + masked_p, vectype, > memory_type, > + unsigned_char_type_node, > new_scale, &ifn, > - &element_type)) > + &offset_vectype)) > break; > scale = new_scale; > off = op0; > @@ -3925,6 +3938,16 @@ vect_check_gather_scatter (stmt_vec_info > if (!POINTER_TYPE_P (TREE_TYPE (op0)) > && !INTEGRAL_TYPE_P (TREE_TYPE (op0))) > break; > + > + /* Don't include the conversion if the target is happy with > + the current offset type. */ > + if (use_ifn_p > + && vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), > + masked_p, vectype, memory_type, > + TREE_TYPE (off), scale, &ifn, > + &offset_vectype)) > + break; > + > if (TYPE_PRECISION (TREE_TYPE (op0)) > == TYPE_PRECISION (TREE_TYPE (off))) > { > @@ -3932,14 +3955,6 @@ vect_check_gather_scatter (stmt_vec_info > continue; > } > > - /* The internal functions need the offset to be the same width > - as the elements of VECTYPE. Don't include operations that > - cast the offset from that width to a different width. */ > - if (use_ifn_p > - && (int_size_in_bytes (TREE_TYPE (vectype)) > - == int_size_in_bytes (TREE_TYPE (off)))) > - break; > - > if (TYPE_PRECISION (TREE_TYPE (op0)) > < TYPE_PRECISION (TREE_TYPE (off))) > { > @@ -3966,10 +3981,9 @@ vect_check_gather_scatter (stmt_vec_info > > if (use_ifn_p) > { > - if (!vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p, vectype, > - memory_type, TYPE_PRECISION (offtype), > - TYPE_SIGN (offtype), scale, &ifn, > - &element_type)) > + if (!vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p, > + vectype, memory_type, offtype, scale, > + &ifn, &offset_vectype)) > return false; > } > else > @@ -3989,7 +4003,8 @@ vect_check_gather_scatter (stmt_vec_info > return false; > > ifn = IFN_LAST; > - element_type = TREE_TYPE (vectype); > + /* The offset vector type will be read from DECL when needed. */ > + offset_vectype = NULL_TREE; > } > > info->ifn = ifn; > @@ -3997,9 +4012,9 @@ vect_check_gather_scatter (stmt_vec_info > info->base = base; > info->offset = off; > info->offset_dt = vect_unknown_def_type; > - info->offset_vectype = NULL_TREE; > + info->offset_vectype = offset_vectype; > info->scale = scale; > - info->element_type = element_type; > + info->element_type = TREE_TYPE (vectype); > info->memory_type = memory_type; > return true; > } > Index: gcc/tree-vect-patterns.c > =================================================================== > --- gcc/tree-vect-patterns.c 2019-11-06 14:02:26.000000000 +0000 > +++ gcc/tree-vect-patterns.c 2019-11-06 16:03:37.372359991 +0000 > @@ -4498,28 +4498,6 @@ vect_get_load_store_mask (stmt_vec_info > gcc_unreachable (); > } > > -/* Return the scalar offset type that an internal gather/scatter function > - should use. GS_INFO describes the gather/scatter operation. */ > - > -static tree > -vect_get_gather_scatter_offset_type (gather_scatter_info *gs_info) > -{ > - tree offset_type = TREE_TYPE (gs_info->offset); > - unsigned int element_bits = tree_to_uhwi (TYPE_SIZE > (gs_info->element_type)); > - > - /* Enforced by vect_check_gather_scatter. */ > - unsigned int offset_bits = TYPE_PRECISION (offset_type); > - gcc_assert (element_bits >= offset_bits); > - > - /* If the offset is narrower than the elements, extend it according > - to its sign. */ > - if (element_bits > offset_bits) > - return build_nonstandard_integer_type (element_bits, > - TYPE_UNSIGNED (offset_type)); > - > - return offset_type; > -} > - > /* Return MASK if MASK is suitable for masking an operation on vectors > of type VECTYPE, otherwise convert it into such a form and return > the result. Associate any conversion statements with STMT_INFO's > @@ -4604,7 +4582,7 @@ vect_recog_gather_scatter_pattern (stmt_ > /* Get the invariant base and non-invariant offset, converting the > latter to the same width as the vector elements. */ > tree base = gs_info.base; > - tree offset_type = vect_get_gather_scatter_offset_type (&gs_info); > + tree offset_type = TREE_TYPE (gs_info.offset_vectype); > tree offset = vect_add_conversion_to_pattern (offset_type, gs_info.offset, > stmt_info); > > @@ -4613,12 +4591,13 @@ vect_recog_gather_scatter_pattern (stmt_ > gcall *pattern_stmt; > if (DR_IS_READ (dr)) > { > + tree zero = build_zero_cst (gs_info.element_type); > if (mask != NULL) > - pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base, > - offset, scale, mask); > + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 5, base, > + offset, scale, zero, mask); > else > - pattern_stmt = gimple_build_call_internal (gs_info.ifn, 3, base, > - offset, scale); > + pattern_stmt = gimple_build_call_internal (gs_info.ifn, 4, base, > + offset, scale, zero); > tree load_lhs = vect_recog_temp_ssa_var (gs_info.element_type, NULL); > gimple_call_set_lhs (pattern_stmt, load_lhs); > } > Index: gcc/tree-vect-stmts.c > =================================================================== > --- gcc/tree-vect-stmts.c 2019-11-06 14:02:26.000000000 +0000 > +++ gcc/tree-vect-stmts.c 2019-11-06 16:03:37.372359991 +0000 > @@ -1910,10 +1910,9 @@ check_load_store_masking (loop_vec_info > internal_fn ifn = (is_load > ? IFN_MASK_GATHER_LOAD > : IFN_MASK_SCATTER_STORE); > - tree offset_type = TREE_TYPE (gs_info->offset); > if (!internal_gather_scatter_fn_supported_p (ifn, vectype, > gs_info->memory_type, > - TYPE_SIGN (offset_type), > + gs_info->offset_vectype, > gs_info->scale)) > { > if (dump_enabled_p ()) > @@ -2046,35 +2045,33 @@ vect_truncate_gather_scatter_offset (stm > if (!wi::multiple_of_p (wi::to_widest (step), scale, SIGNED, &factor)) > continue; > > - /* See whether we can calculate (COUNT - 1) * STEP / SCALE > - in OFFSET_BITS bits. */ > + /* Determine the minimum precision of (COUNT - 1) * STEP / SCALE. */ > widest_int range = wi::mul (count, factor, SIGNED, &overflow); > if (overflow) > continue; > signop sign = range >= 0 ? UNSIGNED : SIGNED; > - if (wi::min_precision (range, sign) > element_bits) > - { > - overflow = wi::OVF_UNKNOWN; > - continue; > - } > + unsigned int min_offset_bits = wi::min_precision (range, sign); > > - /* See whether the target supports the operation. */ > + /* Find the narrowest viable offset type. */ > + unsigned int offset_bits = 1U << ceil_log2 (min_offset_bits); > + tree offset_type = build_nonstandard_integer_type (offset_bits, > + sign == UNSIGNED); > + > + /* See whether the target supports the operation with an offset > + no narrower than OFFSET_TYPE. */ > tree memory_type = TREE_TYPE (DR_REF (dr)); > - if (!vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p, vectype, > - memory_type, element_bits, sign, scale, > - &gs_info->ifn, &gs_info->element_type)) > + if (!vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr), masked_p, > + vectype, memory_type, offset_type, scale, > + &gs_info->ifn, &gs_info->offset_vectype)) > continue; > > - tree offset_type = build_nonstandard_integer_type (element_bits, > - sign == UNSIGNED); > - > gs_info->decl = NULL_TREE; > /* Logically the sum of DR_BASE_ADDRESS, DR_INIT and DR_OFFSET, > but we don't need to store that here. */ > gs_info->base = NULL_TREE; > + gs_info->element_type = TREE_TYPE (vectype); > gs_info->offset = fold_convert (offset_type, step); > gs_info->offset_dt = vect_constant_def; > - gs_info->offset_vectype = NULL_TREE; > gs_info->scale = scale; > gs_info->memory_type = memory_type; > return true; > @@ -2104,22 +2101,12 @@ vect_use_strided_gather_scatters_p (stmt > return vect_truncate_gather_scatter_offset (stmt_info, loop_vinfo, > masked_p, gs_info); > > - scalar_mode element_mode = SCALAR_TYPE_MODE (gs_info->element_type); > - unsigned int element_bits = GET_MODE_BITSIZE (element_mode); > - tree offset_type = TREE_TYPE (gs_info->offset); > - unsigned int offset_bits = TYPE_PRECISION (offset_type); > - > - /* Enforced by vect_check_gather_scatter. */ > - gcc_assert (element_bits >= offset_bits); > + tree old_offset_type = TREE_TYPE (gs_info->offset); > + tree new_offset_type = TREE_TYPE (gs_info->offset_vectype); > > - /* If the elements are wider than the offset, convert the offset to the > - same width, without changing its sign. */ > - if (element_bits > offset_bits) > - { > - bool unsigned_p = TYPE_UNSIGNED (offset_type); > - offset_type = build_nonstandard_integer_type (element_bits, > unsigned_p); > - gs_info->offset = fold_convert (offset_type, gs_info->offset); > - } > + gcc_assert (TYPE_PRECISION (new_offset_type) > + >= TYPE_PRECISION (old_offset_type)); > + gs_info->offset = fold_convert (new_offset_type, gs_info->offset); > > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > @@ -2963,7 +2950,6 @@ vect_get_gather_scatter_ops (class loop > gather_scatter_info *gs_info, > tree *dataref_ptr, tree *vec_offset) > { > - vec_info *vinfo = stmt_info->vinfo; > gimple_seq stmts = NULL; > *dataref_ptr = force_gimple_operand (gs_info->base, &stmts, true, > NULL_TREE); > if (stmts != NULL) > @@ -2973,10 +2959,8 @@ vect_get_gather_scatter_ops (class loop > new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); > gcc_assert (!new_bb); > } > - tree offset_type = TREE_TYPE (gs_info->offset); > - tree offset_vectype = get_vectype_for_scalar_type (vinfo, offset_type); > *vec_offset = vect_get_vec_def_for_operand (gs_info->offset, stmt_info, > - offset_vectype); > + gs_info->offset_vectype); > } > > /* Prepare to implement a grouped or strided load or store using > @@ -3009,8 +2993,7 @@ vect_get_strided_load_store_ops (stmt_ve > /* The offset given in GS_INFO can have pointer type, so use the element > type of the vector instead. */ > tree offset_type = TREE_TYPE (gs_info->offset); > - tree offset_vectype = get_vectype_for_scalar_type (loop_vinfo, > offset_type); > - offset_type = TREE_TYPE (offset_vectype); > + offset_type = TREE_TYPE (gs_info->offset_vectype); > > /* Calculate X = DR_STEP / SCALE and convert it to the appropriate type. > */ > tree step = size_binop (EXACT_DIV_EXPR, DR_STEP (dr), > @@ -3019,7 +3002,7 @@ vect_get_strided_load_store_ops (stmt_ve > step = force_gimple_operand (step, &stmts, true, NULL_TREE); > > /* Create {0, X, X*2, X*3, ...}. */ > - *vec_offset = gimple_build (&stmts, VEC_SERIES_EXPR, offset_vectype, > + *vec_offset = gimple_build (&stmts, VEC_SERIES_EXPR, > gs_info->offset_vectype, > build_zero_cst (offset_type), step); > if (stmts) > gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts); > @@ -9442,16 +9425,17 @@ vectorizable_load (stmt_vec_info stmt_in > > if (memory_access_type == VMAT_GATHER_SCATTER) > { > + tree zero = build_zero_cst (vectype); > tree scale = size_int (gs_info.scale); > gcall *call; > if (loop_masks) > call = gimple_build_call_internal > - (IFN_MASK_GATHER_LOAD, 4, dataref_ptr, > - vec_offset, scale, final_mask); > + (IFN_MASK_GATHER_LOAD, 5, dataref_ptr, > + vec_offset, scale, zero, final_mask); > else > call = gimple_build_call_internal > - (IFN_GATHER_LOAD, 3, dataref_ptr, > - vec_offset, scale); > + (IFN_GATHER_LOAD, 4, dataref_ptr, > + vec_offset, scale, zero); > gimple_call_set_nothrow (call, true); > new_stmt = call; > data_ref = NULL_TREE; > Index: gcc/config/aarch64/aarch64-sve.md > =================================================================== > --- gcc/config/aarch64/aarch64-sve.md 2019-10-29 17:01:12.639889324 +0000 > +++ gcc/config/aarch64/aarch64-sve.md 2019-11-06 16:03:37.352360131 +0000 > @@ -1336,7 +1336,7 @@ (define_insn "@aarch64_ldnt1<mode>" > ;; ------------------------------------------------------------------------- > > ;; Unpredicated gather loads. > -(define_expand "gather_load<mode>" > +(define_expand "gather_load<mode><v_int_equiv>" > [(set (match_operand:SVE_SD 0 "register_operand") > (unspec:SVE_SD > [(match_dup 5) > @@ -1354,7 +1354,7 @@ (define_expand "gather_load<mode>" > > ;; Predicated gather loads for 32-bit elements. Operand 3 is true for > ;; unsigned extension and false for signed extension. > -(define_insn "mask_gather_load<mode>" > +(define_insn "mask_gather_load<mode><v_int_equiv>" > [(set (match_operand:SVE_S 0 "register_operand" "=w, w, w, w, w, w") > (unspec:SVE_S > [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, > Upl, Upl") > @@ -1376,7 +1376,7 @@ (define_insn "mask_gather_load<mode>" > > ;; Predicated gather loads for 64-bit elements. The value of operand 3 > ;; doesn't matter in this case. > -(define_insn "mask_gather_load<mode>" > +(define_insn "mask_gather_load<mode><v_int_equiv>" > [(set (match_operand:SVE_D 0 "register_operand" "=w, w, w, w") > (unspec:SVE_D > [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl") > @@ -1395,7 +1395,7 @@ (define_insn "mask_gather_load<mode>" > ) > > ;; Likewise, but with the offset being sign-extended from 32 bits. > -(define_insn "*mask_gather_load<mode>_sxtw" > +(define_insn "*mask_gather_load<mode><v_int_equiv>_sxtw" > [(set (match_operand:SVE_D 0 "register_operand" "=w, w") > (unspec:SVE_D > [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") > @@ -1417,7 +1417,7 @@ (define_insn "*mask_gather_load<mode>_sx > ) > > ;; Likewise, but with the offset being zero-extended from 32 bits. > -(define_insn "*mask_gather_load<mode>_uxtw" > +(define_insn "*mask_gather_load<mode><v_int_equiv>_uxtw" > [(set (match_operand:SVE_D 0 "register_operand" "=w, w") > (unspec:SVE_D > [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") > @@ -2054,7 +2054,7 @@ (define_insn "@aarch64_stnt1<mode>" > ;; ------------------------------------------------------------------------- > > ;; Unpredicated scatter stores. > -(define_expand "scatter_store<mode>" > +(define_expand "scatter_store<mode><v_int_equiv>" > [(set (mem:BLK (scratch)) > (unspec:BLK > [(match_dup 5) > @@ -2072,7 +2072,7 @@ (define_expand "scatter_store<mode>" > > ;; Predicated scatter stores for 32-bit elements. Operand 2 is true for > ;; unsigned extension and false for signed extension. > -(define_insn "mask_scatter_store<mode>" > +(define_insn "mask_scatter_store<mode><v_int_equiv>" > [(set (mem:BLK (scratch)) > (unspec:BLK > [(match_operand:VNx4BI 5 "register_operand" "Upl, Upl, Upl, Upl, > Upl, Upl") > @@ -2094,7 +2094,7 @@ (define_insn "mask_scatter_store<mode>" > > ;; Predicated scatter stores for 64-bit elements. The value of operand 2 > ;; doesn't matter in this case. > -(define_insn "mask_scatter_store<mode>" > +(define_insn "mask_scatter_store<mode><v_int_equiv>" > [(set (mem:BLK (scratch)) > (unspec:BLK > [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl, Upl, Upl") > @@ -2113,7 +2113,7 @@ (define_insn "mask_scatter_store<mode>" > ) > > ;; Likewise, but with the offset being sign-extended from 32 bits. > -(define_insn_and_rewrite "*mask_scatter_store<mode>_sxtw" > +(define_insn_and_rewrite "*mask_scatter_store<mode><v_int_equiv>_sxtw" > [(set (mem:BLK (scratch)) > (unspec:BLK > [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl") > @@ -2139,7 +2139,7 @@ (define_insn_and_rewrite "*mask_scatter_ > ) > > ;; Likewise, but with the offset being zero-extended from 32 bits. > -(define_insn "*mask_scatter_store<mode>_uxtw" > +(define_insn "*mask_scatter_store<mode><v_int_equiv>_uxtw" > [(set (mem:BLK (scratch)) > (unspec:BLK > [(match_operand:VNx2BI 5 "register_operand" "Upl, Upl")