On Mon, Sep 18, 2017 at 1:58 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although > there was also a hard-coded assumption that this was equal to the type > size. This was inconvenient for SVE for two reasons: > > - When compiling for a specific power-of-2 SVE vector length, we might > want to align to a full vector. However, the TYPE_ALIGN is governed > by the ABI alignment, which is 128 bits regardless of size. > > - For vector-length-agnostic code it doesn't usually make sense to align, > since the runtime vector length might not be a power of two. Even for > power of two sizes, there's no guarantee that aligning to the previous > 16 bytes will be an improveent. > > This patch therefore adds a target hook to control the preferred > vectoriser (as opposed to ABI) alignment. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. > Also tested by comparing the testsuite assembly output on at least one > target per CPU directory. OK to install?
Did you specifically choose to pass the hook a vector type rather than a mode? I suppose in peeling for alignment the target should be able to prevent peeling by returning element alignment from the hook? Ok. Thanks, Richard. > Richard > > > 2017-09-18 Richard Sandiford <richard.sandif...@linaro.org> > Alan Hayward <alan.hayw...@arm.com> > David Sherwood <david.sherw...@arm.com> > > gcc/ > * target.def (preferred_vector_alignment): New hook. > * doc/tm.texi.in (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): New > hook. > * doc/tm.texi: Regenerate. > * targhooks.h (default_preferred_vector_alignment): Declare. > * targhooks.c (default_preferred_vector_alignment): New function. > * tree-vectorizer.h (dataref_aux): Add a target_alignment field. > Expand commentary. > (DR_TARGET_ALIGNMENT): New macro. > (aligned_access_p): Update commentary. > (vect_known_alignment_in_bytes): New function. > * tree-vect-data-refs.c (vect_calculate_required_alignment): New > function. > (vect_compute_data_ref_alignment): Set DR_TARGET_ALIGNMENT. > Calculate the misalignment based on the target alignment rather than > the vector size. > (vect_update_misalignment_for_peel): Use DR_TARGET_ALIGMENT > rather than TYPE_ALIGN / BITS_PER_UNIT to update the misalignment. > (vect_enhance_data_refs_alignment): Mask the byte misalignment with > the target alignment, rather than masking the element misalignment > with the number of elements in a vector. Also use the target > alignment when calculating the maximum number of peels. > (vect_find_same_alignment_drs): Use vect_calculate_required_alignment > instead of TYPE_ALIGN_UNIT. > (vect_duplicate_ssa_name_ptr_info): Remove stmt_info parameter. > Measure DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT. > (vect_create_addr_base_for_vector_ref): Update call accordingly. > (vect_create_data_ref_ptr): Likewise. > (vect_setup_realignment): Realign by ANDing with > -DR_TARGET_MISALIGNMENT. > * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Calculate > the number of peels based on DR_TARGET_ALIGNMENT. > * tree-vect-stmts.c (get_group_load_store_type): Compare the gap > with the guaranteed alignment boundary when deciding whether > overrun is OK. > (vectorizable_mask_load_store): Interpret DR_MISALIGNMENT > relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT. > (ensure_base_align): Remove stmt_info parameter. Get the > target base alignment from DR_TARGET_ALIGNMENT. > (vectorizable_store): Update call accordingly. Interpret > DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of > TYPE_ALIGN_UNIT. > (vectorizable_load): Likewise. > > gcc/testsuite/ > * gcc.dg/vect/vect-outer-3a.c: Adjust dump scan for new wording > of alignment message. > * gcc.dg/vect/vect-outer-3a-big-array.c: Likewise. > > Index: gcc/target.def > =================================================================== > *** gcc/target.def 2017-09-18 12:56:24.635070853 +0100 > --- gcc/target.def 2017-09-18 12:56:24.847378559 +0100 > *************** misalignment value (@var{misalign}).", > *** 1820,1825 **** > --- 1820,1839 ---- > int, (enum vect_cost_for_stmt type_of_cost, tree vectype, int misalign), > default_builtin_vectorization_cost) > > + DEFHOOK > + (preferred_vector_alignment, > + "This hook returns the preferred alignment in bits for accesses to\n\ > + vectors of type @var{type} in vectorized code. This might be less than\n\ > + or greater than the ABI-defined value returned by\n\ > + @code{TARGET_VECTOR_ALIGNMENT}. It can be equal to the alignment of\n\ > + a single element, in which case the vectorizer will not try to optimize\n\ > + for alignment.\n\ > + \n\ > + The default hook returns @code{TYPE_ALIGN (@var{type})}, which is\n\ > + correct for most targets.", > + HOST_WIDE_INT, (const_tree type), > + default_preferred_vector_alignment) > + > /* Return true if vector alignment is reachable (by peeling N > iterations) for the given scalar type. */ > DEFHOOK > Index: gcc/doc/tm.texi.in > =================================================================== > *** gcc/doc/tm.texi.in 2017-09-18 12:56:24.635070853 +0100 > --- gcc/doc/tm.texi.in 2017-09-18 12:56:24.846475122 +0100 > *************** address; but often a machine-dependent > *** 4086,4091 **** > --- 4086,4093 ---- > > @hook TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST > > + @hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT > + > @hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE > > @hook TARGET_VECTORIZE_VEC_PERM_CONST_OK > Index: gcc/doc/tm.texi > =================================================================== > *** gcc/doc/tm.texi 2017-09-18 12:56:24.635070853 +0100 > --- gcc/doc/tm.texi 2017-09-18 12:56:24.846475122 +0100 > *************** For vector memory operations the cost ma > *** 5754,5759 **** > --- 5754,5771 ---- > misalignment value (@var{misalign}). > @end deftypefn > > + @deftypefn {Target Hook} HOST_WIDE_INT > TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT (const_tree @var{type}) > + This hook returns the preferred alignment in bits for accesses to > + vectors of type @var{type} in vectorized code. This might be less than > + or greater than the ABI-defined value returned by > + @code{TARGET_VECTOR_ALIGNMENT}. It can be equal to the alignment of > + a single element, in which case the vectorizer will not try to optimize > + for alignment. > + > + The default hook returns @code{TYPE_ALIGN (@var{type})}, which is > + correct for most targets. > + @end deftypefn > + > @deftypefn {Target Hook} bool TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE > (const_tree @var{type}, bool @var{is_packed}) > Return true if vector alignment is reachable (by peeling N iterations) for > the given scalar type @var{type}. @var{is_packed} is false if the scalar > access using @var{type} is known to be naturally aligned. > @end deftypefn > Index: gcc/targhooks.h > =================================================================== > *** gcc/targhooks.h 2017-09-18 12:56:24.635070853 +0100 > --- gcc/targhooks.h 2017-09-18 12:56:24.847378559 +0100 > *************** extern tree default_builtin_reciprocal ( > *** 95,100 **** > --- 95,101 ---- > > extern HOST_WIDE_INT default_vector_alignment (const_tree); > > + extern HOST_WIDE_INT default_preferred_vector_alignment (const_tree); > extern bool default_builtin_vector_alignment_reachable (const_tree, bool); > extern bool > default_builtin_support_vector_misalignment (machine_mode mode, > Index: gcc/targhooks.c > =================================================================== > *** gcc/targhooks.c 2017-09-18 12:56:24.635070853 +0100 > --- gcc/targhooks.c 2017-09-18 12:56:24.847378559 +0100 > *************** default_vector_alignment (const_tree typ > *** 1175,1180 **** > --- 1175,1189 ---- > return align; > } > > + /* The default implementation of > + TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT. */ > + > + HOST_WIDE_INT > + default_preferred_vector_alignment (const_tree type) > + { > + return TYPE_ALIGN (type); > + } > + > /* By default assume vectors of element TYPE require a multiple of the > natural > alignment of TYPE. TYPE is naturally aligned if IS_PACKED is false. */ > bool > Index: gcc/tree-vectorizer.h > =================================================================== > *** gcc/tree-vectorizer.h 2017-09-18 12:56:24.635070853 +0100 > --- gcc/tree-vectorizer.h 2017-09-18 12:56:24.850088870 +0100 > *************** #define PURE_SLP_STMT(S) > *** 790,796 **** > --- 790,800 ---- > #define STMT_SLP_TYPE(S) (S)->slp_type > > struct dataref_aux { > + /* The misalignment in bytes of the reference, or -1 if not known. */ > int misalignment; > + /* The byte alignment that we'd ideally like the reference to have, > + and the value that misalignment is measured against. */ > + int target_alignment; > /* If true the alignment of base_decl needs to be increased. */ > bool base_misaligned; > tree base_decl; > *************** #define DR_MISALIGNMENT(DR) dr_misalignm > *** 1037,1043 **** > #define SET_DR_MISALIGNMENT(DR, VAL) set_dr_misalignment (DR, VAL) > #define DR_MISALIGNMENT_UNKNOWN (-1) > > ! /* Return TRUE if the data access is aligned, and FALSE otherwise. */ > > static inline bool > aligned_access_p (struct data_reference *data_ref_info) > --- 1041,1051 ---- > #define SET_DR_MISALIGNMENT(DR, VAL) set_dr_misalignment (DR, VAL) > #define DR_MISALIGNMENT_UNKNOWN (-1) > > ! /* Only defined once DR_MISALIGNMENT is defined. */ > ! #define DR_TARGET_ALIGNMENT(DR) DR_VECT_AUX (DR)->target_alignment > ! > ! /* Return true if data access DR is aligned to its target alignment > ! (which may be less than a full vector). */ > > static inline bool > aligned_access_p (struct data_reference *data_ref_info) > *************** known_alignment_for_access_p (struct dat > *** 1054,1059 **** > --- 1062,1080 ---- > return (DR_MISALIGNMENT (data_ref_info) != DR_MISALIGNMENT_UNKNOWN); > } > > + /* Return the minimum alignment in bytes that the vectorized version > + of DR is guaranteed to have. */ > + > + static inline unsigned int > + vect_known_alignment_in_bytes (struct data_reference *dr) > + { > + if (DR_MISALIGNMENT (dr) == DR_MISALIGNMENT_UNKNOWN) > + return TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr))); > + if (DR_MISALIGNMENT (dr) == 0) > + return DR_TARGET_ALIGNMENT (dr); > + return DR_MISALIGNMENT (dr) & -DR_MISALIGNMENT (dr); > + } > + > /* Return the behavior of DR with respect to the vectorization context > (which for outer loop vectorization might not be the behavior recorded > in DR itself). */ > Index: gcc/tree-vect-data-refs.c > =================================================================== > *** gcc/tree-vect-data-refs.c 2017-09-18 12:56:24.635070853 +0100 > --- gcc/tree-vect-data-refs.c 2017-09-18 12:56:24.849185433 +0100 > *************** vect_record_base_alignments (vec_info *v > *** 775,780 **** > --- 775,791 ---- > } > } > > + /* Return the target alignment for the vectorized form of DR. */ > + > + static unsigned int > + vect_calculate_target_alignment (struct data_reference *dr) > + { > + gimple *stmt = DR_STMT (dr); > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > + return targetm.vectorize.preferred_vector_alignment (vectype); > + } > + > /* Function vect_compute_data_ref_alignment > > Compute the misalignment of the data reference DR. > *************** vect_compute_data_ref_alignment (struct > *** 811,816 **** > --- 822,831 ---- > innermost_loop_behavior *drb = vect_dr_behavior (dr); > bool step_preserves_misalignment_p; > > + unsigned HOST_WIDE_INT vector_alignment > + = vect_calculate_target_alignment (dr) / BITS_PER_UNIT; > + DR_TARGET_ALIGNMENT (dr) = vector_alignment; > + > /* No step for BB vectorization. */ > if (!loop) > { > *************** vect_compute_data_ref_alignment (struct > *** 823,865 **** > relative to the outer-loop (LOOP). This is ok only if the misalignment > stays the same throughout the execution of the inner-loop, which is why > we have to check that the stride of the dataref in the inner-loop > evenly > ! divides by the vector size. */ > else if (nested_in_vect_loop_p (loop, stmt)) > { > step_preserves_misalignment_p > ! = (DR_STEP_ALIGNMENT (dr) > ! % GET_MODE_SIZE (TYPE_MODE (vectype))) == 0; > > if (dump_enabled_p ()) > { > if (step_preserves_misalignment_p) > dump_printf_loc (MSG_NOTE, vect_location, > ! "inner step divides the vector-size.\n"); > else > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > ! "inner step doesn't divide the vector-size.\n"); > } > } > > /* Similarly we can only use base and misalignment information relative to > an innermost loop if the misalignment stays the same throughout the > execution of the loop. As above, this is the case if the stride of > ! the dataref evenly divides by the vector size. */ > else > { > unsigned vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > step_preserves_misalignment_p > ! = ((DR_STEP_ALIGNMENT (dr) * vf) > ! % GET_MODE_SIZE (TYPE_MODE (vectype))) == 0; > > if (!step_preserves_misalignment_p && dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > ! "step doesn't divide the vector-size.\n"); > } > > unsigned int base_alignment = drb->base_alignment; > unsigned int base_misalignment = drb->base_misalignment; > - unsigned HOST_WIDE_INT vector_alignment = TYPE_ALIGN_UNIT (vectype); > > /* Calculate the maximum of the pooled base address alignment and the > alignment that we can compute for DR itself. */ > --- 838,878 ---- > relative to the outer-loop (LOOP). This is ok only if the misalignment > stays the same throughout the execution of the inner-loop, which is why > we have to check that the stride of the dataref in the inner-loop > evenly > ! divides by the vector alignment. */ > else if (nested_in_vect_loop_p (loop, stmt)) > { > step_preserves_misalignment_p > ! = (DR_STEP_ALIGNMENT (dr) % vector_alignment) == 0; > > if (dump_enabled_p ()) > { > if (step_preserves_misalignment_p) > dump_printf_loc (MSG_NOTE, vect_location, > ! "inner step divides the vector alignment.\n"); > else > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > ! "inner step doesn't divide the vector" > ! " alignment.\n"); > } > } > > /* Similarly we can only use base and misalignment information relative to > an innermost loop if the misalignment stays the same throughout the > execution of the loop. As above, this is the case if the stride of > ! the dataref evenly divides by the alignment. */ > else > { > unsigned vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > step_preserves_misalignment_p > ! = ((DR_STEP_ALIGNMENT (dr) * vf) % vector_alignment) == 0; > > if (!step_preserves_misalignment_p && dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > ! "step doesn't divide the vector alignment.\n"); > } > > unsigned int base_alignment = drb->base_alignment; > unsigned int base_misalignment = drb->base_misalignment; > > /* Calculate the maximum of the pooled base address alignment and the > alignment that we can compute for DR itself. */ > *************** vect_update_misalignment_for_peel (struc > *** 1007,1015 **** > { > bool negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) < > 0; > int misal = DR_MISALIGNMENT (dr); > - tree vectype = STMT_VINFO_VECTYPE (stmt_info); > misal += negative ? -npeel * dr_size : npeel * dr_size; > ! misal &= (TYPE_ALIGN (vectype) / BITS_PER_UNIT) - 1; > SET_DR_MISALIGNMENT (dr, misal); > return; > } > --- 1020,1027 ---- > { > bool negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) < > 0; > int misal = DR_MISALIGNMENT (dr); > misal += negative ? -npeel * dr_size : npeel * dr_size; > ! misal &= DR_TARGET_ALIGNMENT (dr) - 1; > SET_DR_MISALIGNMENT (dr, misal); > return; > } > *************** vect_enhance_data_refs_alignment (loop_v > *** 1657,1672 **** > { > if (known_alignment_for_access_p (dr)) > { > ! unsigned int npeel_tmp = 0; > bool negative = tree_int_cst_compare (DR_STEP (dr), > size_zero_node) < 0; > > ! vectype = STMT_VINFO_VECTYPE (stmt_info); > ! nelements = TYPE_VECTOR_SUBPARTS (vectype); > ! mis = DR_MISALIGNMENT (dr) / vect_get_scalar_dr_size (dr); > if (DR_MISALIGNMENT (dr) != 0) > ! npeel_tmp = (negative ? (mis - nelements) > ! : (nelements - mis)) & (nelements - 1); > > /* For multiple types, it is possible that the bigger type > access > will have more than one peeling option. E.g., a loop with > two > --- 1669,1685 ---- > { > if (known_alignment_for_access_p (dr)) > { > ! unsigned int npeel_tmp = 0; > bool negative = tree_int_cst_compare (DR_STEP (dr), > size_zero_node) < 0; > > ! vectype = STMT_VINFO_VECTYPE (stmt_info); > ! nelements = TYPE_VECTOR_SUBPARTS (vectype); > ! unsigned int target_align = DR_TARGET_ALIGNMENT (dr); > ! unsigned int dr_size = vect_get_scalar_dr_size (dr); > ! mis = (negative ? DR_MISALIGNMENT (dr) : -DR_MISALIGNMENT (dr)); > if (DR_MISALIGNMENT (dr) != 0) > ! npeel_tmp = (mis & (target_align - 1)) / dr_size; > > /* For multiple types, it is possible that the bigger type > access > will have more than one peeling option. E.g., a loop with > two > *************** vect_enhance_data_refs_alignment (loop_v > *** 1701,1707 **** > { > vect_peeling_hash_insert (&peeling_htab, loop_vinfo, > dr, npeel_tmp); > ! npeel_tmp += nelements; > } > > one_misalignment_known = true; > --- 1714,1720 ---- > { > vect_peeling_hash_insert (&peeling_htab, loop_vinfo, > dr, npeel_tmp); > ! npeel_tmp += target_align / dr_size; > } > > one_misalignment_known = true; > *************** vect_enhance_data_refs_alignment (loop_v > *** 1922,1928 **** > stmt = DR_STMT (dr0); > stmt_info = vinfo_for_stmt (stmt); > vectype = STMT_VINFO_VECTYPE (stmt_info); > - nelements = TYPE_VECTOR_SUBPARTS (vectype); > > if (known_alignment_for_access_p (dr0)) > { > --- 1935,1940 ---- > *************** vect_enhance_data_refs_alignment (loop_v > *** 1935,1943 **** > updating DR_MISALIGNMENT values. The peeling factor is the > vectorization factor minus the misalignment as an element > count. */ > ! mis = DR_MISALIGNMENT (dr0) / vect_get_scalar_dr_size (dr0); > ! npeel = ((negative ? mis - nelements : nelements - mis) > ! & (nelements - 1)); > } > > /* For interleaved data access every iteration accesses all the > --- 1947,1956 ---- > updating DR_MISALIGNMENT values. The peeling factor is the > vectorization factor minus the misalignment as an element > count. */ > ! mis = negative ? DR_MISALIGNMENT (dr0) : -DR_MISALIGNMENT (dr0); > ! unsigned int target_align = DR_TARGET_ALIGNMENT (dr0); > ! npeel = ((mis & (target_align - 1)) > ! / vect_get_scalar_dr_size (dr0)); > } > > /* For interleaved data access every iteration accesses all the > *************** vect_enhance_data_refs_alignment (loop_v > *** 1976,1985 **** > unsigned max_peel = npeel; > if (max_peel == 0) > { > ! gimple *dr_stmt = DR_STMT (dr0); > ! stmt_vec_info vinfo = vinfo_for_stmt (dr_stmt); > ! tree vtype = STMT_VINFO_VECTYPE (vinfo); > ! max_peel = TYPE_VECTOR_SUBPARTS (vtype) - 1; > } > if (max_peel > max_allowed_peel) > { > --- 1989,1996 ---- > unsigned max_peel = npeel; > if (max_peel == 0) > { > ! unsigned int target_align = DR_TARGET_ALIGNMENT (dr0); > ! max_peel = target_align / vect_get_scalar_dr_size (dr0) - 1; > } > if (max_peel > max_allowed_peel) > { > *************** vect_find_same_alignment_drs (struct dat > *** 2201,2208 **** > if (diff != 0) > { > /* Get the wider of the two alignments. */ > ! unsigned int align_a = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE > (stmtinfo_a)); > ! unsigned int align_b = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE > (stmtinfo_b)); > unsigned int max_align = MAX (align_a, align_b); > > /* Require the gap to be a multiple of the larger vector alignment. > */ > --- 2212,2221 ---- > if (diff != 0) > { > /* Get the wider of the two alignments. */ > ! unsigned int align_a = (vect_calculate_target_alignment (dra) > ! / BITS_PER_UNIT); > ! unsigned int align_b = (vect_calculate_target_alignment (drb) > ! / BITS_PER_UNIT); > unsigned int max_align = MAX (align_a, align_b); > > /* Require the gap to be a multiple of the larger vector alignment. > */ > *************** vect_get_new_ssa_name (tree type, enum v > *** 3995,4010 **** > /* Duplicate ptr info and set alignment/misaligment on NAME from DR. */ > > static void > ! vect_duplicate_ssa_name_ptr_info (tree name, data_reference *dr, > ! stmt_vec_info stmt_info) > { > duplicate_ssa_name_ptr_info (name, DR_PTR_INFO (dr)); > - unsigned int align = TYPE_ALIGN_UNIT (STMT_VINFO_VECTYPE (stmt_info)); > int misalign = DR_MISALIGNMENT (dr); > if (misalign == DR_MISALIGNMENT_UNKNOWN) > mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name)); > else > ! set_ptr_info_alignment (SSA_NAME_PTR_INFO (name), align, misalign); > } > > /* Function vect_create_addr_base_for_vector_ref. > --- 4008,4022 ---- > /* Duplicate ptr info and set alignment/misaligment on NAME from DR. */ > > static void > ! vect_duplicate_ssa_name_ptr_info (tree name, data_reference *dr) > { > duplicate_ssa_name_ptr_info (name, DR_PTR_INFO (dr)); > int misalign = DR_MISALIGNMENT (dr); > if (misalign == DR_MISALIGNMENT_UNKNOWN) > mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name)); > else > ! set_ptr_info_alignment (SSA_NAME_PTR_INFO (name), > ! DR_TARGET_ALIGNMENT (dr), misalign); > } > > /* Function vect_create_addr_base_for_vector_ref. > *************** vect_create_addr_base_for_vector_ref (gi > *** 4109,4115 **** > && TREE_CODE (addr_base) == SSA_NAME > && !SSA_NAME_PTR_INFO (addr_base)) > { > ! vect_duplicate_ssa_name_ptr_info (addr_base, dr, stmt_info); > if (offset || byte_offset) > mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (addr_base)); > } > --- 4121,4127 ---- > && TREE_CODE (addr_base) == SSA_NAME > && !SSA_NAME_PTR_INFO (addr_base)) > { > ! vect_duplicate_ssa_name_ptr_info (addr_base, dr); > if (offset || byte_offset) > mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (addr_base)); > } > *************** vect_create_data_ref_ptr (gimple *stmt, > *** 4368,4375 **** > /* Copy the points-to information if it exists. */ > if (DR_PTR_INFO (dr)) > { > ! vect_duplicate_ssa_name_ptr_info (indx_before_incr, dr, stmt_info); > ! vect_duplicate_ssa_name_ptr_info (indx_after_incr, dr, stmt_info); > } > if (ptr_incr) > *ptr_incr = incr; > --- 4380,4387 ---- > /* Copy the points-to information if it exists. */ > if (DR_PTR_INFO (dr)) > { > ! vect_duplicate_ssa_name_ptr_info (indx_before_incr, dr); > ! vect_duplicate_ssa_name_ptr_info (indx_after_incr, dr); > } > if (ptr_incr) > *ptr_incr = incr; > *************** vect_create_data_ref_ptr (gimple *stmt, > *** 4398,4405 **** > /* Copy the points-to information if it exists. */ > if (DR_PTR_INFO (dr)) > { > ! vect_duplicate_ssa_name_ptr_info (indx_before_incr, dr, stmt_info); > ! vect_duplicate_ssa_name_ptr_info (indx_after_incr, dr, stmt_info); > } > if (ptr_incr) > *ptr_incr = incr; > --- 4410,4417 ---- > /* Copy the points-to information if it exists. */ > if (DR_PTR_INFO (dr)) > { > ! vect_duplicate_ssa_name_ptr_info (indx_before_incr, dr); > ! vect_duplicate_ssa_name_ptr_info (indx_after_incr, dr); > } > if (ptr_incr) > *ptr_incr = incr; > *************** vect_setup_realignment (gimple *stmt, gi > *** 5003,5012 **** > new_temp = copy_ssa_name (ptr); > else > new_temp = make_ssa_name (TREE_TYPE (ptr)); > new_stmt = gimple_build_assign > (new_temp, BIT_AND_EXPR, ptr, > ! build_int_cst (TREE_TYPE (ptr), > ! -(HOST_WIDE_INT)TYPE_ALIGN_UNIT > (vectype))); > new_bb = gsi_insert_on_edge_immediate (pe, new_stmt); > gcc_assert (!new_bb); > data_ref > --- 5015,5024 ---- > new_temp = copy_ssa_name (ptr); > else > new_temp = make_ssa_name (TREE_TYPE (ptr)); > + unsigned int align = DR_TARGET_ALIGNMENT (dr); > new_stmt = gimple_build_assign > (new_temp, BIT_AND_EXPR, ptr, > ! build_int_cst (TREE_TYPE (ptr), -(HOST_WIDE_INT) align)); > new_bb = gsi_insert_on_edge_immediate (pe, new_stmt); > gcc_assert (!new_bb); > data_ref > Index: gcc/tree-vect-loop-manip.c > =================================================================== > *** gcc/tree-vect-loop-manip.c 2017-09-18 12:56:24.635070853 +0100 > --- gcc/tree-vect-loop-manip.c 2017-09-18 12:56:24.849185433 +0100 > *************** vect_gen_prolog_loop_niters (loop_vec_in > *** 956,963 **** > gimple *dr_stmt = DR_STMT (dr); > stmt_vec_info stmt_info = vinfo_for_stmt (dr_stmt); > tree vectype = STMT_VINFO_VECTYPE (stmt_info); > ! int vectype_align = TYPE_ALIGN (vectype) / BITS_PER_UNIT; > ! int nelements = TYPE_VECTOR_SUBPARTS (vectype); > > if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0) > { > --- 956,962 ---- > gimple *dr_stmt = DR_STMT (dr); > stmt_vec_info stmt_info = vinfo_for_stmt (dr_stmt); > tree vectype = STMT_VINFO_VECTYPE (stmt_info); > ! unsigned int target_align = DR_TARGET_ALIGNMENT (dr); > > if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0) > { > *************** vect_gen_prolog_loop_niters (loop_vec_in > *** 978,1009 **** > tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt, > &stmts, offset); > tree type = unsigned_type_for (TREE_TYPE (start_addr)); > ! tree vectype_align_minus_1 = build_int_cst (type, vectype_align - 1); > ! HOST_WIDE_INT elem_size = > ! int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); > tree elem_size_log = build_int_cst (type, exact_log2 (elem_size)); > ! tree nelements_minus_1 = build_int_cst (type, nelements - 1); > ! tree nelements_tree = build_int_cst (type, nelements); > ! tree byte_misalign; > ! tree elem_misalign; > ! > ! /* Create: byte_misalign = addr & (vectype_align - 1) */ > ! byte_misalign = > ! fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr), > ! vectype_align_minus_1); > ! > ! /* Create: elem_misalign = byte_misalign / element_size */ > ! elem_misalign = > ! fold_build2 (RSHIFT_EXPR, type, byte_misalign, elem_size_log); > > ! /* Create: (niters_type) (nelements - elem_misalign)&(nelements - 1) > */ > if (negative) > ! iters = fold_build2 (MINUS_EXPR, type, elem_misalign, nelements_tree); > else > ! iters = fold_build2 (MINUS_EXPR, type, nelements_tree, elem_misalign); > ! iters = fold_build2 (BIT_AND_EXPR, type, iters, nelements_minus_1); > iters = fold_convert (niters_type, iters); > ! *bound = nelements - 1; > } > > if (dump_enabled_p ()) > --- 977,1012 ---- > tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt, > &stmts, offset); > tree type = unsigned_type_for (TREE_TYPE (start_addr)); > ! tree target_align_minus_1 = build_int_cst (type, target_align - 1); > ! HOST_WIDE_INT elem_size > ! = int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); > tree elem_size_log = build_int_cst (type, exact_log2 (elem_size)); > ! HOST_WIDE_INT align_in_elems = target_align / elem_size; > ! tree align_in_elems_minus_1 = build_int_cst (type, align_in_elems - > 1); > ! tree align_in_elems_tree = build_int_cst (type, align_in_elems); > ! tree misalign_in_bytes; > ! tree misalign_in_elems; > ! > ! /* Create: misalign_in_bytes = addr & (target_align - 1). */ > ! misalign_in_bytes > ! = fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr), > ! target_align_minus_1); > ! > ! /* Create: misalign_in_elems = misalign_in_bytes / element_size. */ > ! misalign_in_elems > ! = fold_build2 (RSHIFT_EXPR, type, misalign_in_bytes, elem_size_log); > > ! /* Create: (niters_type) ((align_in_elems - misalign_in_elems) > ! & (align_in_elems - 1)). */ > if (negative) > ! iters = fold_build2 (MINUS_EXPR, type, misalign_in_elems, > ! align_in_elems_tree); > else > ! iters = fold_build2 (MINUS_EXPR, type, align_in_elems_tree, > ! misalign_in_elems); > ! iters = fold_build2 (BIT_AND_EXPR, type, iters, > align_in_elems_minus_1); > iters = fold_convert (niters_type, iters); > ! *bound = align_in_elems - 1; > } > > if (dump_enabled_p ()) > Index: gcc/tree-vect-stmts.c > =================================================================== > *** gcc/tree-vect-stmts.c 2017-09-18 12:56:24.635070853 +0100 > --- gcc/tree-vect-stmts.c 2017-09-18 12:56:24.850088870 +0100 > *************** get_group_load_store_type (gimple *stmt, > *** 1737,1742 **** > --- 1737,1743 ---- > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > struct loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; > gimple *first_stmt = GROUP_FIRST_ELEMENT (stmt_info); > + data_reference *first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt > (first_stmt)); > unsigned int group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt)); > bool single_element_p = (stmt == first_stmt > && !GROUP_NEXT_ELEMENT (stmt_info)); > *************** get_group_load_store_type (gimple *stmt, > *** 1780,1789 **** > " non-consecutive accesses\n"); > return false; > } > ! /* If the access is aligned an overrun is fine. */ > if (overrun_p > ! && aligned_access_p > ! (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt)))) > overrun_p = false; > if (overrun_p && !can_overrun_p) > { > --- 1781,1793 ---- > " non-consecutive accesses\n"); > return false; > } > ! /* An overrun is fine if the trailing elements are smaller > ! than the alignment boundary B. Every vector access will > ! be a multiple of B and so we are guaranteed to access a > ! non-gap element in the same B-sized block. */ > if (overrun_p > ! && gap < (vect_known_alignment_in_bytes (first_dr) > ! / vect_get_scalar_dr_size (first_dr))) > overrun_p = false; > if (overrun_p && !can_overrun_p) > { > *************** get_group_load_store_type (gimple *stmt, > *** 1804,1817 **** > /* If there is a gap at the end of the group then these optimizations > would access excess elements in the last iteration. */ > bool would_overrun_p = (gap != 0); > ! /* If the access is aligned an overrun is fine, but only if the > ! overrun is not inside an unused vector (if the gap is as large > ! or larger than a vector). */ > if (would_overrun_p > ! && gap < nunits > ! && aligned_access_p > ! (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt)))) > would_overrun_p = false; > if (!STMT_VINFO_STRIDED_P (stmt_info) > && (can_overrun_p || !would_overrun_p) > && compare_step_with_zero (stmt) > 0) > --- 1808,1822 ---- > /* If there is a gap at the end of the group then these optimizations > would access excess elements in the last iteration. */ > bool would_overrun_p = (gap != 0); > ! /* An overrun is fine if the trailing elements are smaller than the > ! alignment boundary B. Every vector access will be a multiple of B > ! and so we are guaranteed to access a non-gap element in the > ! same B-sized block. */ > if (would_overrun_p > ! && gap < (vect_known_alignment_in_bytes (first_dr) > ! / vect_get_scalar_dr_size (first_dr))) > would_overrun_p = false; > + > if (!STMT_VINFO_STRIDED_P (stmt_info) > && (can_overrun_p || !would_overrun_p) > && compare_step_with_zero (stmt) > 0) > *************** vectorizable_mask_load_store (gimple *st > *** 2351,2357 **** > TYPE_SIZE_UNIT (vectype)); > } > > ! align = TYPE_ALIGN_UNIT (vectype); > if (aligned_access_p (dr)) > misalign = 0; > else if (DR_MISALIGNMENT (dr) == -1) > --- 2356,2362 ---- > TYPE_SIZE_UNIT (vectype)); > } > > ! align = DR_TARGET_ALIGNMENT (dr); > if (aligned_access_p (dr)) > misalign = 0; > else if (DR_MISALIGNMENT (dr) == -1) > *************** vectorizable_mask_load_store (gimple *st > *** 2404,2410 **** > TYPE_SIZE_UNIT (vectype)); > } > > ! align = TYPE_ALIGN_UNIT (vectype); > if (aligned_access_p (dr)) > misalign = 0; > else if (DR_MISALIGNMENT (dr) == -1) > --- 2409,2415 ---- > TYPE_SIZE_UNIT (vectype)); > } > > ! align = DR_TARGET_ALIGNMENT (dr); > if (aligned_access_p (dr)) > misalign = 0; > else if (DR_MISALIGNMENT (dr) == -1) > *************** vectorizable_operation (gimple *stmt, gi > *** 5553,5577 **** > return true; > } > > ! /* A helper function to ensure data reference DR's base alignment > ! for STMT_INFO. */ > > static void > ! ensure_base_align (stmt_vec_info stmt_info, struct data_reference *dr) > { > if (!dr->aux) > return; > > if (DR_VECT_AUX (dr)->base_misaligned) > { > - tree vectype = STMT_VINFO_VECTYPE (stmt_info); > tree base_decl = DR_VECT_AUX (dr)->base_decl; > > if (decl_in_symtab_p (base_decl)) > ! symtab_node::get (base_decl)->increase_alignment (TYPE_ALIGN > (vectype)); > else > { > ! SET_DECL_ALIGN (base_decl, TYPE_ALIGN (vectype)); > DECL_USER_ALIGN (base_decl) = 1; > } > DR_VECT_AUX (dr)->base_misaligned = false; > --- 5558,5582 ---- > return true; > } > > ! /* A helper function to ensure data reference DR's base alignment. */ > > static void > ! ensure_base_align (struct data_reference *dr) > { > if (!dr->aux) > return; > > if (DR_VECT_AUX (dr)->base_misaligned) > { > tree base_decl = DR_VECT_AUX (dr)->base_decl; > > + unsigned int align_base_to = DR_TARGET_ALIGNMENT (dr) * BITS_PER_UNIT; > + > if (decl_in_symtab_p (base_decl)) > ! symtab_node::get (base_decl)->increase_alignment (align_base_to); > else > { > ! SET_DECL_ALIGN (base_decl, align_base_to); > DECL_USER_ALIGN (base_decl) = 1; > } > DR_VECT_AUX (dr)->base_misaligned = false; > *************** vectorizable_store (gimple *stmt, gimple > *** 5775,5781 **** > > /* Transform. */ > > ! ensure_base_align (stmt_info, dr); > > if (memory_access_type == VMAT_GATHER_SCATTER) > { > --- 5780,5786 ---- > > /* Transform. */ > > ! ensure_base_align (dr); > > if (memory_access_type == VMAT_GATHER_SCATTER) > { > *************** vectorizable_store (gimple *stmt, gimple > *** 6417,6423 **** > dataref_offset > ? dataref_offset > : build_int_cst (ref_type, 0)); > ! align = TYPE_ALIGN_UNIT (vectype); > if (aligned_access_p (first_dr)) > misalign = 0; > else if (DR_MISALIGNMENT (first_dr) == -1) > --- 6422,6428 ---- > dataref_offset > ? dataref_offset > : build_int_cst (ref_type, 0)); > ! align = DR_TARGET_ALIGNMENT (first_dr); > if (aligned_access_p (first_dr)) > misalign = 0; > else if (DR_MISALIGNMENT (first_dr) == -1) > *************** vectorizable_load (gimple *stmt, gimple_ > *** 6813,6819 **** > > /* Transform. */ > > ! ensure_base_align (stmt_info, dr); > > if (memory_access_type == VMAT_GATHER_SCATTER) > { > --- 6818,6824 ---- > > /* Transform. */ > > ! ensure_base_align (dr); > > if (memory_access_type == VMAT_GATHER_SCATTER) > { > *************** vectorizable_load (gimple *stmt, gimple_ > *** 7512,7518 **** > dataref_offset > ? dataref_offset > : build_int_cst (ref_type, 0)); > ! align = TYPE_ALIGN_UNIT (vectype); > if (alignment_support_scheme == dr_aligned) > { > gcc_assert (aligned_access_p (first_dr)); > --- 7517,7523 ---- > dataref_offset > ? dataref_offset > : build_int_cst (ref_type, 0)); > ! align = DR_TARGET_ALIGNMENT (dr); > if (alignment_support_scheme == dr_aligned) > { > gcc_assert (aligned_access_p (first_dr)); > *************** vectorizable_load (gimple *stmt, gimple_ > *** 7555,7565 **** > ptr = copy_ssa_name (dataref_ptr); > else > ptr = make_ssa_name (TREE_TYPE (dataref_ptr)); > new_stmt = gimple_build_assign > (ptr, BIT_AND_EXPR, dataref_ptr, > build_int_cst > (TREE_TYPE (dataref_ptr), > ! -(HOST_WIDE_INT)TYPE_ALIGN_UNIT > (vectype))); > vect_finish_stmt_generation (stmt, new_stmt, gsi); > data_ref > = build2 (MEM_REF, vectype, ptr, > --- 7560,7571 ---- > ptr = copy_ssa_name (dataref_ptr); > else > ptr = make_ssa_name (TREE_TYPE (dataref_ptr)); > + unsigned int align = DR_TARGET_ALIGNMENT (first_dr); > new_stmt = gimple_build_assign > (ptr, BIT_AND_EXPR, dataref_ptr, > build_int_cst > (TREE_TYPE (dataref_ptr), > ! -(HOST_WIDE_INT) align)); > vect_finish_stmt_generation (stmt, new_stmt, gsi); > data_ref > = build2 (MEM_REF, vectype, ptr, > *************** vectorizable_load (gimple *stmt, gimple_ > *** 7581,7588 **** > new_stmt = gimple_build_assign > (NULL_TREE, BIT_AND_EXPR, ptr, > build_int_cst > ! (TREE_TYPE (ptr), > ! -(HOST_WIDE_INT)TYPE_ALIGN_UNIT > (vectype))); > ptr = copy_ssa_name (ptr, new_stmt); > gimple_assign_set_lhs (new_stmt, ptr); > vect_finish_stmt_generation (stmt, new_stmt, gsi); > --- 7587,7593 ---- > new_stmt = gimple_build_assign > (NULL_TREE, BIT_AND_EXPR, ptr, > build_int_cst > ! (TREE_TYPE (ptr), -(HOST_WIDE_INT) align)); > ptr = copy_ssa_name (ptr, new_stmt); > gimple_assign_set_lhs (new_stmt, ptr); > vect_finish_stmt_generation (stmt, new_stmt, gsi); > *************** vectorizable_load (gimple *stmt, gimple_ > *** 7592,7611 **** > break; > } > case dr_explicit_realign_optimized: > ! if (TREE_CODE (dataref_ptr) == SSA_NAME) > ! new_temp = copy_ssa_name (dataref_ptr); > ! else > ! new_temp = make_ssa_name (TREE_TYPE (dataref_ptr)); > ! new_stmt = gimple_build_assign > ! (new_temp, BIT_AND_EXPR, dataref_ptr, > ! build_int_cst > ! (TREE_TYPE (dataref_ptr), > ! -(HOST_WIDE_INT)TYPE_ALIGN_UNIT > (vectype))); > ! vect_finish_stmt_generation (stmt, new_stmt, gsi); > ! data_ref > ! = build2 (MEM_REF, vectype, new_temp, > ! build_int_cst (ref_type, 0)); > ! break; > default: > gcc_unreachable (); > } > --- 7597,7618 ---- > break; > } > case dr_explicit_realign_optimized: > ! { > ! if (TREE_CODE (dataref_ptr) == SSA_NAME) > ! new_temp = copy_ssa_name (dataref_ptr); > ! else > ! new_temp = make_ssa_name (TREE_TYPE (dataref_ptr)); > ! unsigned int align = DR_TARGET_ALIGNMENT (first_dr); > ! new_stmt = gimple_build_assign > ! (new_temp, BIT_AND_EXPR, dataref_ptr, > ! build_int_cst (TREE_TYPE (dataref_ptr), > ! -(HOST_WIDE_INT) align)); > ! vect_finish_stmt_generation (stmt, new_stmt, gsi); > ! data_ref > ! = build2 (MEM_REF, vectype, new_temp, > ! build_int_cst (ref_type, 0)); > ! break; > ! } > default: > gcc_unreachable (); > } > Index: gcc/testsuite/gcc.dg/vect/vect-outer-3a.c > =================================================================== > *** gcc/testsuite/gcc.dg/vect/vect-outer-3a.c 2017-09-18 12:56:24.635070853 > +0100 > --- gcc/testsuite/gcc.dg/vect/vect-outer-3a.c 2017-09-18 12:56:24.849185433 > +0100 > *************** int main (void) > *** 49,52 **** > } > > /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { > xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ > ! /* { dg-final { scan-tree-dump-times "step doesn't divide the vector-size" > 1 "vect" } } */ > --- 49,52 ---- > } > > /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { > xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ > ! /* { dg-final { scan-tree-dump-times "step doesn't divide the vector > alignment" 1 "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c > =================================================================== > *** gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c 2017-09-18 > 12:56:24.635070853 +0100 > --- gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c 2017-09-18 > 12:56:24.847378559 +0100 > *************** int main (void) > *** 49,52 **** > } > > /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { > xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ > ! /* { dg-final { scan-tree-dump-times "step doesn't divide the vector-size" > 1 "vect" } } */ > --- 49,52 ---- > } > > /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { > xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ > ! /* { dg-final { scan-tree-dump-times "step doesn't divide the vector > alignment" 1 "vect" } } */