On Thu, 21 Nov 2024, Alex Coplan wrote:

> On 21/11/2024 10:02, Richard Biener wrote:
> > On Fri, 15 Nov 2024, Alex Coplan wrote:
> > 
> > > Hi,
> > > 
> > > This is a v2 which hopefully addresses the feedback for v1 of the 1/5
> > > patch, originally posted here:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666648.html
> > > 
> > > As mentioned on IRC, it will need follow-up work to fix up latent
> > > profile issues, but that can be done during stage 3.  We will also need
> > > a simple (hopefully obvious, even) follow-up patch to fix expectations
> > > for various tests (since we now vectorize loops which we previously
> > > couldn't).
> > > 
> > > OK for trunk?
> > 
> > I'm still looking at
> > 
> > +  if (dr_info->need_peeling_for_alignment)
> > +    {
> > +      /* Vector size in bytes.  */
> > +      poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT 
> > (vectype));
> > +
> > +      /* We can only peel for loops, of course.  */
> > +      gcc_checking_assert (loop_vinfo);
> > +
> > +      /* Calculate the number of vectors read per vector iteration.  If
> > +        it is a power of two, multiply through to get the required
> > +        alignment in bytes.  Otherwise, fail analysis since alignment
> > +        peeling wouldn't work in such a case.  */
> > +      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> > +      if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> > +       vf *= DR_GROUP_SIZE (stmt_info);
> > 
> > so this is the total number of scalars we load, so maybe call it
> > that way, num_scalars.
> 
> Will do.
> 
> > 
> > +
> > +      auto num_vectors = vect_get_num_vectors (vf, vectype);
> > +      if (!pow2p_hwi (num_vectors))
> > 
> > side-note - with all these multiplies there's the possibility that
> > we get a testcase that has safe_align > PAGE_SIZE, meaning it's
> > no longer a good way to avoid trapping.  This problem of course
> > exists generally, we avoid it elsewhere by not having very large
> > vectors or limiting the group-size.  The actual problem is that
> > we don't know the actual page size, but we maybe could configure
> > a minimum as part of the platform configuration.  Should we for
> > now simply add
> > 
> >   || safe_align > 4096
> > 
> > here?  A testcase would load 512 contiguous uint64 to form an early exit
> > condition, quite unlikely I guess.
> 
> Good point.  I suppose this really depends whether there are
> targets/platforms that GCC supports with a page size smaller than 4k.
> Perhaps a min_page_size target hook (defaulting to 4k) would be
> sensible.  Then if there are any such targets they can override the
> hook.  WDYT?

Yeah, though this might be overkill at this point and a hard-coded
4096 is fine ...

> > 
> > With DR_GROUP_SIZE != 1 there's also the question whether we can
> > ever reach the desired alignment since each peel will skip
> > DR_GROUP_SIZE scalar elements - either those are already
> > DR_GROUP_SIZE aligned or the result will never be.
> 
> I had a look at this, I think this should already be handled by
> vector_alignment_reachable_p which already includes a suitable check for
> STMT_VINFO_GROUPED_ACCESS.  E.g. for the following testcase:
> 
> double *a;
> int f(int n) {
>   for (int i = 0; i < n; i += 2)
>     if (a[i])
>       __builtin_abort();
> }
> 
> we have DR_GROUP_SIZE = 2 and in the dump (-O3, with the alignment peeling
> patches) we see:
> 
> note:   === vect_enhance_data_refs_alignment ===
> missed:   vector alignment may not be reachable
> note: vect_can_advance_ivs_p:
> note:   Analyze phi: i_12 = PHI <i_9(8), 0(7)>
> note:   Alignment of access forced using versioning.
> note:   Versioning for alignment will be applied.
> 
> so we decide to version instead, since peeling isn't viable here
> (vector_alignment_reachable_p correctly returns false).

Ah yeah, I thought you might need to adjust that but indeed the check
there should already work.

> Perhaps that says the DR flag should really be called
> need_peeling_or_versioning_for_alignment (although better I guess just
> to update the comment above its definition to mention versioning).

Or simply ->requires_alignment however we arrange for that.  But it's
just a name, so ...

> > 
> > @@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
> > dr_vec_info *dr_info,
> >    if (misalignment == DR_MISALIGNMENT_UNKNOWN)
> >      is_packed = not_size_aligned (DR_REF (dr));
> >    if (targetm.vectorize.support_vector_misalignment (mode, type, 
> > misalignment,
> > -                                                    is_packed))
> > +                                                    is_packed)
> > +      && !dr_info->need_peeling_for_alignment)
> >      return dr_unaligned_supported;
> > 
> > I think you need to do this earlier, like with
> > 
> >   if (misalignment == 0)
> >     return dr_aligned;
> > + else if (dr_info->need_peeling_for_alignment)
> > +   return dr_unaligned_unsupported;
> 
> That seems more obviously correct indeed.  I'll make that change.
> 
> > 
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index c8dc7153298..be2c2a1bc75 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -3129,12 +3129,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> > niters, tree nitersm1,
> >    int estimated_vf;
> >    int prolog_peeling = 0;
> >    bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL;
> > -  /* We currently do not support prolog peeling if the target alignment 
> > is not
> > -     known at compile time.  'vect_gen_prolog_loop_niters' depends on the
> > -     target alignment being constant.  */
> > -  dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
> > -  if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ())
> > -    return NULL;
> > 
> > this indeed looks like an odd (wrong) place to enforce this, so I
> > suppose you figured the check is not needed, independent of the patch?
> 
> Yeah, exactly, it seems that it isn't needed.

Agreed.

> > 
> > OK with the above changes.
> 
> Thanks!  I guess we just need to decide on the page size issue above
> before going ahead.

Just use a check against 4096 with a comment I'd say.  The set of
targets with vector cbranch (to have early exit vect in the first place)
is quite limited and none supports a smaller page size.

Richard.

> Alex
> 
> > 
> > Thanks,
> > Richard.
> > 
> > 
> > > Thanks,
> > > Alex
> > > 
> > > -- >8 --
> > > 
> > > This allows us to vectorize more loops with early exits by forcing
> > > peeling for alignment to make sure that we're guaranteed to be able to
> > > safely read an entire vector iteration without crossing a page boundary.
> > > 
> > > To make this work for VLA architectures we have to allow compile-time
> > > non-constant target alignments.  We also have to override the result of
> > > the target's preferred_vector_alignment hook if it isn't already a
> > > power-of-two multiple of the amount read per vector iteration.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
> > >   Set need_peeling_for_alignment flag on read DRs instead of
> > >   failing vectorization.  Punt on gathers and strided_p accesses.
> > >   (dr_misalignment): Handle non-constant target alignments.
> > >   (vect_compute_data_ref_alignment): If need_peeling_for_alignment
> > >   flag is set on the DR, then override the target alignment chosen
> > >   by the preferred_vector_alignment hook to choose a safe
> > >   alignment.  Add new result parameter.  Use it ...
> > >   (vect_analyze_data_refs_alignment): ... here, and fail
> > >   analysis if vect_compute_data_ref_alignment sets it to a
> > >   failure.
> > >   (vect_supportable_dr_alignment): Override
> > >   support_vector_misalignment hook if need_peeling_for_alignment
> > >   is set on the DR: in this case we must return
> > >   dr_unaligned_unsupported in order to force peeling.
> > >   * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
> > >   peeling by a compile-time non-constant amount.
> > >   * tree-vectorizer.h (dr_vec_info): Add new flag
> > >   need_peeling_for_alignment.
> > > 
> > 
> > -- 
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to