On Wed, Mar 18, 2020 at 2:56 PM Kewen.Lin <li...@linux.ibm.com> wrote: > > Hi Richi, > > Thanks for your comments. > > on 2020/3/18 下午6:39, Richard Biener wrote: > > On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <li...@linux.ibm.com> wrote: > >> > >> Hi, > >> > >> As PR90332 shows, the current scalar epilogue peeling for gaps > >> elimination requires expected vec_init optab with two half size > >> vector mode. On Power, we don't support vector mode like V8QI, > >> so can't support optab like vec_initv16qiv8qi. But we want to > >> leverage existing scalar mode like DI to init the desirable > >> vector mode. This patch is to extend the existing support for > >> Power, as evaluated on Power9 we can see expected 1.9% speed up > >> on SPEC2017 525.x264_r. > >> > >> Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9. > >> > >> Is it ok for trunk? > > > > There's already code exercising such a case in vectorizable_load > > (VMAT_STRIDED_SLP) which you could have factored out. > > > > Nice, will refer to and factor it. > > > vectype, bool slp, > > than the alignment boundary B. Every vector access will > > be a multiple of B and so we are guaranteed to access a > > non-gap element in the same B-sized block. */ > > + machine_mode half_mode; > > if (overrun_p > > && gap < (vect_known_alignment_in_bytes (first_dr_info) > > / vect_get_scalar_dr_size (first_dr_info))) > > - overrun_p = false; > > - > > + { > > + overrun_p = false; > > + if (known_eq (nunits, (group_size - gap) * 2) > > + && known_eq (nunits, group_size) > > + && get_half_mode_for_vector (vectype, &half_mode)) > > + DR_GROUP_HALF_MODE (first_stmt_info) = half_mode; > > + } > > > > why do you need to amend this case? > > > > This path can define overrun_p to false, some case can fall into > "no peeling for gaps" hunk in vectorizable_load. Since I used > DR_GROUP_HALF_MODE to save the half mode, if some case matches > this condition, vectorizable_load hunk can get unitialized > DR_GROUP_HALF_MODE. But even with proposed recomputing way, I > think we still need to check the vec_init optab here if the > know_eq half size conditions hold?
Hmm, but for the above case it's fine to access the excess elements. I guess the vectorizable_load code needs to be amended with the alignment check or we do need to store somewhere our decision to use smaller loads. > > > I don't like storing DR_GROUP_HALF_MODE very much, later > > you need a vector type and it looks cheap enough to recompute > > it where you need it? Iff then it doesn't belong to DR_GROUP > > but to the stmt-info. > > > > OK, I was intended not to recompute it for time saving, will > throw it away. > > > I realize the original optimization was kind of a hack (and I was too > > lazy to implement the integer mode construction path ...). > > > > So, can you factor out the existing code into a function returning > > the vector type for construction for a vector type and a > > pieces size? So for V16QI and a pieces-size of 4 we'd > > get either V16QI back (then construction from V4QI pieces > > should work) or V4SI (then construction from SImode pieces > > should work)? Eventually as secondary output provide that > > piece type (SI / V4QI). > > Sure. I'm very poor to get a function name, does function name > suitable_vector_and_pieces sound good? > ie. tree suitable_vector_and_pieces (tree vtype, tree *ptype); tree vector_vector_composition_type (tree vtype, poly_uint64 nelts, tree *ptype); where nelts specifies the number of vtype elements in a piece. Richard. > > BR, > Kewen >