On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Hi All, > > Here is patch for non-masked epilogue vectoriziation. > > Bootstrap and regression testing did not show any new failures. > > Is it OK for trunk? > > Thanks. > Changelog: > > 2016-11-15 Yuri Rumyantsev <ysrum...@gmail.com> > > * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. > * tree-if-conv.c (tree_if_conversion): Make public. > * * tree-if-conv.h: New file. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid > dynamic alias checks for epilogues. > * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. > * tree-vect-loop.c: include tree-if-conv.h. > (new_loop_vec_info): Add zeroing orig_loop_info field. > (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. > (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL > if epilogue is vectorized, set up orig_loop_info field of loop_vinfo > using passed argument. > (vect_transform_loop): Check if created epilogue should be returned > for further vectorization with less vf. If-convert epilogue if > required. Print vectorization success for epilogue. > * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization > if it is required, pass loop_vinfo produced during vectorization of > loop body to vect_analyze_loop. > * tree-vectorizer.h (struct _loop_vec_info): Add new field > orig_loop_info. > (LOOP_VINFO_ORIG_LOOP_INFO): New. > (LOOP_VINFO_EPILOGUE_P): New. > (LOOP_VINFO_ORIG_VECT_FACTOR): New. > (vect_do_peeling): Change prototype to return epilogue. > (vect_analyze_loop): Add argument of loop_vec_info type. > (vect_transform_loop): Return created loop. > > gcc/testsuite/ > > * lib/target-supports.exp (check_avx2_hw_available): New. > (check_effective_target_avx2_runtime): New. > * gcc.dg/vect/vect-tail-nomask-1.c: New test. >
Hi, This new test fails on arm-none-eabi (using default cpu/fpu/mode): gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-tail-nomask-1.c execution test It does pass on the same target if configured --with-cpu=cortex-a9. Christophe > > 2016-11-14 20:04 GMT+03:00 Richard Biener <rguent...@suse.de>: >> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >> <ysrum...@gmail.com> wrote: >>>Richard, >>> >>>I checked one of the tests designed for epilogue vectorization using >>>patches 1 - 3 and found out that build compiler performs vectorization >>>of epilogues with --param vect-epilogues-nomask=1 passed: >>> >>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>>t1.new-nomask.s -fdump-tree-vect-details >>>$ grep VECTORIZED -c t1.c.156t.vect >>>4 >>> Without param only 2 loops are vectorized. >>> >>>Should I simply add a part of tests related to this feature or I must >>>delete all not necessary changes also? >> >> Please remove all not necessary changes. >> >> Richard. >> >>>Thanks. >>>Yuri. >>> >>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >>>> >>>>> Richard, >>>>> >>>>> In my previous patch I forgot to remove couple lines related to aux >>>field. >>>>> Here is the correct updated patch. >>>> >>>> Yeah, I noticed. This patch would be ok for trunk (together with >>>> necessary parts from 1 and 2) if all not required parts are removed >>>> (and you'd add the testcases covering non-masked tail vect). >>>> >>>> Thus, can you please produce a single complete patch containing only >>>> non-masked epilogue vectoriziation? >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> Thanks. >>>>> Yuri. >>>>> >>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >>>>> > >>>>> >> Richard, >>>>> >> >>>>> >> I prepare updated 3 patch with passing additional argument to >>>>> >> vect_analyze_loop as you proposed (untested). >>>>> >> >>>>> >> You wrote: >>>>> >> tw, I wonder if you can produce a single patch containing just >>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out >>>>> >> changes only needed by later patches? >>>>> >> >>>>> >> Did you mean that I exclude all support for vectorization >>>epilogues, >>>>> >> i.e. exclude from 2-nd patch all non-related changes >>>>> >> like >>>>> >> >>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >>>>> >> index 11863af..32011c1 100644 >>>>> >> --- a/gcc/tree-vect-loop.c >>>>> >> +++ b/gcc/tree-vect-loop.c >>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >>>>> >> LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >>>>> >> LOOP_VINFO_PEELING_FOR_NITER (res) = false; >>>>> >> LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >>>>> >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >>>>> >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >>>>> >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >>>>> >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >>>>> >> + LOOP_VINFO_NEED_MASKING (res) = false; >>>>> >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; >>>>> > >>>>> > Yes. >>>>> > >>>>> >> Did you mean also that new combined patch must be working patch, >>>i.e. >>>>> >> can be integrated without other patches? >>>>> > >>>>> > Yes. >>>>> > >>>>> >> Could you please look at updated patch? >>>>> > >>>>> > Will do. >>>>> > >>>>> > Thanks, >>>>> > Richard. >>>>> > >>>>> >> Thanks. >>>>> >> Yuri. >>>>> >> >>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote: >>>>> >> > >>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >>>>> >> >> >>>>> >> >> > Richard, >>>>> >> >> > >>>>> >> >> > Here is updated 3 patch. >>>>> >> >> > >>>>> >> >> > I checked that all new tests related to epilogue >>>vectorization passed with it. >>>>> >> >> > >>>>> >> >> > Your comments will be appreciated. >>>>> >> >> >>>>> >> >> A lot better now. Instead of the ->aux dance I now prefer to >>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as >>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that >>>>> >> >> loop_vinfo). OTOH I remember we mainly use it to get at the >>>>> >> >> original vectorization factor? So we can pass down an >>>(optional) >>>>> >> >> forced vectorization factor as well? >>>>> >> > >>>>> >> > Btw, I wonder if you can produce a single patch containing just >>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out >>>>> >> > changes only needed by later patches? >>>>> >> > >>>>> >> > Thanks, >>>>> >> > Richard. >>>>> >> > >>>>> >> >> Richard. >>>>> >> >> >>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener >>><rguent...@suse.de>: >>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >>>>> >> >> > > >>>>> >> >> > >> Hi Richard, >>>>> >> >> > >> >>>>> >> >> > >> I did not understand your last remark: >>>>> >> >> > >> >>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>>>> >> >> > >> > >>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>>>> >> >> > >> > && dump_enabled_p ()) >>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >>>vect_location, >>>>> >> >> > >> > "loop vectorized\n"); >>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>>>> >> >> > >> > num_vectorized_loops++; >>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >>>it to be unrolled >>>>> >> >> > >> > etc. */ >>>>> >> >> > >> > loop->force_vectorize = false; >>>>> >> >> > >> > >>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >>>it easier >>>>> >> >> > >> > + to match loop and its epilogue vectorization >>>in dumps >>>>> >> >> > >> > + put new loop as the next loop to process. >>>*/ >>>>> >> >> > >> > + if (new_loop) >>>>> >> >> > >> > + { >>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >>>>> >> >> > >> > + } >>>>> >> >> > >> > >>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >>>new_loop) >>>>> >> >> > >> f> unction which will set up stuff properly (and also >>>perform >>>>> >> >> > >> > the if-conversion of the epilogue there). >>>>> >> >> > >> > >>>>> >> >> > >> > That said, if we can get in non-masked epilogue >>>vectorization >>>>> >> >> > >> > separately that would be great. >>>>> >> >> > >> >>>>> >> >> > >> Could you please clarify your proposal. >>>>> >> >> > > >>>>> >> >> > > When a loop was vectorized set things up to immediately >>>vectorize >>>>> >> >> > > its epilogue, avoiding changing the loop iteration and >>>avoiding >>>>> >> >> > > the re-use of ->aux. >>>>> >> >> > > >>>>> >> >> > > Richard. >>>>> >> >> > > >>>>> >> >> > >> Thanks. >>>>> >> >> > >> Yuri. >>>>> >> >> > >> >>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener >>><rguent...@suse.de>: >>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >>>>> >> >> > >> > >>>>> >> >> > >> >> Hi All, >>>>> >> >> > >> >> >>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review >>>which support >>>>> >> >> > >> >> vectorization of loop epilogues and loops with low >>>trip count. We >>>>> >> >> > >> >> assume that the only patch - >>>vec-tails-07-combine-tail.patch - was not >>>>> >> >> > >> >> approved by Jeff. >>>>> >> >> > >> >> >>>>> >> >> > >> >> I did re-base of all patches and performed >>>bootstrapping and >>>>> >> >> > >> >> regression testing that did not show any new failures. >>>Also all >>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have >>>been changed >>>>> >> >> > >> >> accordingly. >>>>> >> >> > >> >> >>>>> >> >> > >> >> Is it OK for trunk? >>>>> >> >> > >> > >>>>> >> >> > >> > I would have prefered that the series up to >>>-03-nomask-tails would >>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but >>>unfortunately >>>>> >> >> > >> > the patchset is oddly separated. >>>>> >> >> > >> > >>>>> >> >> > >> > I have a comment on that part nevertheless: >>>>> >> >> > >> > >>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment >>>(loop_vec_info >>>>> >> >> > >> > loop_vinfo) >>>>> >> >> > >> > /* Check if we can possibly peel the loop. */ >>>>> >> >> > >> > if (!vect_can_advance_ivs_p (loop_vinfo) >>>>> >> >> > >> > || !slpeel_can_duplicate_loop_p (loop, >>>single_exit (loop)) >>>>> >> >> > >> > - || loop->inner) >>>>> >> >> > >> > + || loop->inner >>>>> >> >> > >> > + /* Required peeling was performed in prologue >>>and >>>>> >> >> > >> > + is not required for epilogue. */ >>>>> >> >> > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) >>>>> >> >> > >> > do_peeling = false; >>>>> >> >> > >> > >>>>> >> >> > >> > if (do_peeling >>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment >>>(loop_vec_info >>>>> >> >> > >> > loop_vinfo) >>>>> >> >> > >> > >>>>> >> >> > >> > do_versioning = >>>>> >> >> > >> > optimize_loop_nest_for_speed_p (loop) >>>>> >> >> > >> > - && (!loop->inner); /* FORNOW */ >>>>> >> >> > >> > + && (!loop->inner) /* FORNOW */ >>>>> >> >> > >> > + /* Required versioning was performed for the >>>>> >> >> > >> > + original loop and is not required for >>>epilogue. */ >>>>> >> >> > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); >>>>> >> >> > >> > >>>>> >> >> > >> > if (do_versioning) >>>>> >> >> > >> > { >>>>> >> >> > >> > >>>>> >> >> > >> > please do that check in the single caller of this >>>function. >>>>> >> >> > >> > >>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I >>>believe that simply >>>>> >> >> > >> > passing down info from the processed parent would be >>>_much_ cleaner. >>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>>>> >> >> > >> > >>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>>>> >> >> > >> > && dump_enabled_p ()) >>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >>>vect_location, >>>>> >> >> > >> > "loop vectorized\n"); >>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>>>> >> >> > >> > num_vectorized_loops++; >>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >>>it to be unrolled >>>>> >> >> > >> > etc. */ >>>>> >> >> > >> > loop->force_vectorize = false; >>>>> >> >> > >> > >>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >>>it easier >>>>> >> >> > >> > + to match loop and its epilogue vectorization >>>in dumps >>>>> >> >> > >> > + put new loop as the next loop to process. >>>*/ >>>>> >> >> > >> > + if (new_loop) >>>>> >> >> > >> > + { >>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >>>>> >> >> > >> > + } >>>>> >> >> > >> > >>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >>>new_loop) >>>>> >> >> > >> > function which will set up stuff properly (and also >>>perform >>>>> >> >> > >> > the if-conversion of the epilogue there). >>>>> >> >> > >> > >>>>> >> >> > >> > That said, if we can get in non-masked epilogue >>>vectorization >>>>> >> >> > >> > separately that would be great. >>>>> >> >> > >> > >>>>> >> >> > >> > I'm still torn about all the rest of the stuff and >>>question its >>>>> >> >> > >> > usability (esp. merging the epilogue with the main >>>vector loop). >>>>> >> >> > >> > But it has already been approved ... oh well. >>>>> >> >> > >> > >>>>> >> >> > >> > Thanks, >>>>> >> >> > >> > Richard. >>>>> >> >> > >> >>>>> >> >> > >> >>>>> >> >> > > >>>>> >> >> > > -- >>>>> >> >> > > Richard Biener <rguent...@suse.de> >>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, >>>Graham Norton, HRB 21284 (AG Nuernberg) >>>>> >> >> > >>>>> >> >> >>>>> >> >> >>>>> >> > >>>>> >> > -- >>>>> >> > Richard Biener <rguent...@suse.de> >>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>Norton, HRB 21284 (AG Nuernberg) >>>>> >> >>>>> > >>>>> > -- >>>>> > Richard Biener <rguent...@suse.de> >>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>Norton, HRB 21284 (AG Nuernberg) >>>>> >>>> >>>> -- >>>> Richard Biener <rguent...@suse.de> >>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>Norton, HRB 21284 (AG Nuernberg) >> >>