On Wed, Mar 18, 2020 at 2:56 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for your comments.
>
> on 2020/3/18 下午6:39, Richard Biener wrote:
> > On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> As PR90332 shows, the current scalar epilogue peeling for gaps
> >> elimination requires expected vec_init optab with two half size
> >> vector mode.  On Power, we don't support vector mode like V8QI,
> >> so can't support optab like vec_initv16qiv8qi.  But we want to
> >> leverage existing scalar mode like DI to init the desirable
> >> vector mode.  This patch is to extend the existing support for
> >> Power, as evaluated on Power9 we can see expected 1.9% speed up
> >> on SPEC2017 525.x264_r.
> >>
> >> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
> >>
> >> Is it ok for trunk?
> >
> > There's already code exercising such a case in vectorizable_load
> > (VMAT_STRIDED_SLP) which you could have factored out.
> >
>
> Nice, will refer to and factor it.
>
> >  vectype, bool slp,
> >              than the alignment boundary B.  Every vector access will
> >              be a multiple of B and so we are guaranteed to access a
> >              non-gap element in the same B-sized block.  */
> > +         machine_mode half_mode;
> >           if (overrun_p
> >               && gap < (vect_known_alignment_in_bytes (first_dr_info)
> >                         / vect_get_scalar_dr_size (first_dr_info)))
> > -           overrun_p = false;
> > -
> > +           {
> > +             overrun_p = false;
> > +             if (known_eq (nunits, (group_size - gap) * 2)
> > +                 && known_eq (nunits, group_size)
> > +                 && get_half_mode_for_vector (vectype, &half_mode))
> > +               DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
> > +           }
> >
> > why do you need to amend this case?
> >
>
> This path can define overrun_p to false, some case can fall into
> "no peeling for gaps" hunk in vectorizable_load.  Since I used
> DR_GROUP_HALF_MODE to save the half mode, if some case matches
> this condition, vectorizable_load hunk can get unitialized
> DR_GROUP_HALF_MODE.  But even with proposed recomputing way, I
> think we still need to check the vec_init optab here if the
> know_eq half size conditions hold?

Hmm, but for the above case it's fine to access the excess elements.

I guess the vectorizable_load code needs to be amended with
the alignment check or we do need to store somewhere our
decision to use smaller loads.

>
> > I don't like storing DR_GROUP_HALF_MODE very much, later
> > you need a vector type and it looks cheap enough to recompute
> > it where you need it?  Iff then it doesn't belong to DR_GROUP
> > but to the stmt-info.
> >
>
> OK, I was intended not to recompute it for time saving, will
> throw it away.
>
> > I realize the original optimization was kind of a hack (and I was too
> > lazy to implement the integer mode construction path ...).
> >
> > So, can you factor out the existing code into a function returning
> > the vector type for construction for a vector type and a
> > pieces size?  So for V16QI and a pieces-size of 4 we'd
> > get either V16QI back (then construction from V4QI pieces
> > should work) or V4SI (then construction from SImode pieces
> > should work)?  Eventually as secondary output provide that
> > piece type (SI / V4QI).
>
> Sure.  I'm very poor to get a function name, does function name
> suitable_vector_and_pieces sound good?
>   ie. tree suitable_vector_and_pieces (tree vtype, tree *ptype);

tree vector_vector_composition_type (tree vtype, poly_uint64 nelts,
tree *ptype);

where nelts specifies the number of vtype elements in a piece.

Richard.

>
> BR,
> Kewen
>

Reply via email to