On Thu, 21 Nov 2024, Alex Coplan wrote: > On 21/11/2024 10:02, Richard Biener wrote: > > On Fri, 15 Nov 2024, Alex Coplan wrote: > > > > > Hi, > > > > > > This is a v2 which hopefully addresses the feedback for v1 of the 1/5 > > > patch, originally posted here: > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666648.html > > > > > > As mentioned on IRC, it will need follow-up work to fix up latent > > > profile issues, but that can be done during stage 3. We will also need > > > a simple (hopefully obvious, even) follow-up patch to fix expectations > > > for various tests (since we now vectorize loops which we previously > > > couldn't). > > > > > > OK for trunk? > > > > I'm still looking at > > > > + if (dr_info->need_peeling_for_alignment) > > + { > > + /* Vector size in bytes. */ > > + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT > > (vectype)); > > + > > + /* We can only peel for loops, of course. */ > > + gcc_checking_assert (loop_vinfo); > > + > > + /* Calculate the number of vectors read per vector iteration. If > > + it is a power of two, multiply through to get the required > > + alignment in bytes. Otherwise, fail analysis since alignment > > + peeling wouldn't work in such a case. */ > > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) > > + vf *= DR_GROUP_SIZE (stmt_info); > > > > so this is the total number of scalars we load, so maybe call it > > that way, num_scalars. > > Will do. > > > > > + > > + auto num_vectors = vect_get_num_vectors (vf, vectype); > > + if (!pow2p_hwi (num_vectors)) > > > > side-note - with all these multiplies there's the possibility that > > we get a testcase that has safe_align > PAGE_SIZE, meaning it's > > no longer a good way to avoid trapping. This problem of course > > exists generally, we avoid it elsewhere by not having very large > > vectors or limiting the group-size. The actual problem is that > > we don't know the actual page size, but we maybe could configure > > a minimum as part of the platform configuration. Should we for > > now simply add > > > > || safe_align > 4096 > > > > here? A testcase would load 512 contiguous uint64 to form an early exit > > condition, quite unlikely I guess. > > Good point. I suppose this really depends whether there are > targets/platforms that GCC supports with a page size smaller than 4k. > Perhaps a min_page_size target hook (defaulting to 4k) would be > sensible. Then if there are any such targets they can override the > hook. WDYT?
Yeah, though this might be overkill at this point and a hard-coded 4096 is fine ... > > > > With DR_GROUP_SIZE != 1 there's also the question whether we can > > ever reach the desired alignment since each peel will skip > > DR_GROUP_SIZE scalar elements - either those are already > > DR_GROUP_SIZE aligned or the result will never be. > > I had a look at this, I think this should already be handled by > vector_alignment_reachable_p which already includes a suitable check for > STMT_VINFO_GROUPED_ACCESS. E.g. for the following testcase: > > double *a; > int f(int n) { > for (int i = 0; i < n; i += 2) > if (a[i]) > __builtin_abort(); > } > > we have DR_GROUP_SIZE = 2 and in the dump (-O3, with the alignment peeling > patches) we see: > > note: === vect_enhance_data_refs_alignment === > missed: vector alignment may not be reachable > note: vect_can_advance_ivs_p: > note: Analyze phi: i_12 = PHI <i_9(8), 0(7)> > note: Alignment of access forced using versioning. > note: Versioning for alignment will be applied. > > so we decide to version instead, since peeling isn't viable here > (vector_alignment_reachable_p correctly returns false). Ah yeah, I thought you might need to adjust that but indeed the check there should already work. > Perhaps that says the DR flag should really be called > need_peeling_or_versioning_for_alignment (although better I guess just > to update the comment above its definition to mention versioning). Or simply ->requires_alignment however we arrange for that. But it's just a name, so ... > > > > @@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, > > dr_vec_info *dr_info, > > if (misalignment == DR_MISALIGNMENT_UNKNOWN) > > is_packed = not_size_aligned (DR_REF (dr)); > > if (targetm.vectorize.support_vector_misalignment (mode, type, > > misalignment, > > - is_packed)) > > + is_packed) > > + && !dr_info->need_peeling_for_alignment) > > return dr_unaligned_supported; > > > > I think you need to do this earlier, like with > > > > if (misalignment == 0) > > return dr_aligned; > > + else if (dr_info->need_peeling_for_alignment) > > + return dr_unaligned_unsupported; > > That seems more obviously correct indeed. I'll make that change. > > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > > index c8dc7153298..be2c2a1bc75 100644 > > --- a/gcc/tree-vect-loop-manip.cc > > +++ b/gcc/tree-vect-loop-manip.cc > > @@ -3129,12 +3129,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > > niters, tree nitersm1, > > int estimated_vf; > > int prolog_peeling = 0; > > bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL; > > - /* We currently do not support prolog peeling if the target alignment > > is not > > - known at compile time. 'vect_gen_prolog_loop_niters' depends on the > > - target alignment being constant. */ > > - dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo); > > - if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ()) > > - return NULL; > > > > this indeed looks like an odd (wrong) place to enforce this, so I > > suppose you figured the check is not needed, independent of the patch? > > Yeah, exactly, it seems that it isn't needed. Agreed. > > > > OK with the above changes. > > Thanks! I guess we just need to decide on the page size issue above > before going ahead. Just use a check against 4096 with a comment I'd say. The set of targets with vector cbranch (to have early exit vect in the first place) is quite limited and none supports a smaller page size. Richard. > Alex > > > > > Thanks, > > Richard. > > > > > > > Thanks, > > > Alex > > > > > > -- >8 -- > > > > > > This allows us to vectorize more loops with early exits by forcing > > > peeling for alignment to make sure that we're guaranteed to be able to > > > safely read an entire vector iteration without crossing a page boundary. > > > > > > To make this work for VLA architectures we have to allow compile-time > > > non-constant target alignments. We also have to override the result of > > > the target's preferred_vector_alignment hook if it isn't already a > > > power-of-two multiple of the amount read per vector iteration. > > > > > > gcc/ChangeLog: > > > > > > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): > > > Set need_peeling_for_alignment flag on read DRs instead of > > > failing vectorization. Punt on gathers and strided_p accesses. > > > (dr_misalignment): Handle non-constant target alignments. > > > (vect_compute_data_ref_alignment): If need_peeling_for_alignment > > > flag is set on the DR, then override the target alignment chosen > > > by the preferred_vector_alignment hook to choose a safe > > > alignment. Add new result parameter. Use it ... > > > (vect_analyze_data_refs_alignment): ... here, and fail > > > analysis if vect_compute_data_ref_alignment sets it to a > > > failure. > > > (vect_supportable_dr_alignment): Override > > > support_vector_misalignment hook if need_peeling_for_alignment > > > is set on the DR: in this case we must return > > > dr_unaligned_unsupported in order to force peeling. > > > * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog > > > peeling by a compile-time non-constant amount. > > > * tree-vectorizer.h (dr_vec_info): Add new flag > > > need_peeling_for_alignment. > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)