On 18 November 2016 at 16:46, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > It is very strange that this test failed on arm, since it requires > target avx2 to check vectorizer dumps: > > /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { > target avx2_runtime } } } */ > /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED > \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ > > Could you please clarify what is the reason of the failure?
It's not the scan-dumps that fail, but the execution. The test calls abort() for some reason. It will take me a while to rebuild the test manually in the right debug environment to provide you with more traces. > > Thanks. > > 2016-11-18 16:20 GMT+03:00 Christophe Lyon <christophe.l...@linaro.org>: >> On 15 November 2016 at 15:41, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Hi All, >>> >>> Here is patch for non-masked epilogue vectoriziation. >>> >>> Bootstrap and regression testing did not show any new failures. >>> >>> Is it OK for trunk? >>> >>> Thanks. >>> Changelog: >>> >>> 2016-11-15 Yuri Rumyantsev <ysrum...@gmail.com> >>> >>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >>> * tree-if-conv.c (tree_if_conversion): Make public. >>> * * tree-if-conv.h: New file. >>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >>> dynamic alias checks for epilogues. >>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >>> * tree-vect-loop.c: include tree-if-conv.h. >>> (new_loop_vec_info): Add zeroing orig_loop_info field. >>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >>> using passed argument. >>> (vect_transform_loop): Check if created epilogue should be returned >>> for further vectorization with less vf. If-convert epilogue if >>> required. Print vectorization success for epilogue. >>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >>> if it is required, pass loop_vinfo produced during vectorization of >>> loop body to vect_analyze_loop. >>> * tree-vectorizer.h (struct _loop_vec_info): Add new field >>> orig_loop_info. >>> (LOOP_VINFO_ORIG_LOOP_INFO): New. >>> (LOOP_VINFO_EPILOGUE_P): New. >>> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >>> (vect_do_peeling): Change prototype to return epilogue. >>> (vect_analyze_loop): Add argument of loop_vec_info type. >>> (vect_transform_loop): Return created loop. >>> >>> gcc/testsuite/ >>> >>> * lib/target-supports.exp (check_avx2_hw_available): New. >>> (check_effective_target_avx2_runtime): New. >>> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >>> >> >> Hi, >> >> This new test fails on arm-none-eabi (using default cpu/fpu/mode): >> gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-tail-nomask-1.c execution test >> >> It does pass on the same target if configured --with-cpu=cortex-a9. >> >> Christophe >> >> >> >>> >>> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >>>> <ysrum...@gmail.com> wrote: >>>>>Richard, >>>>> >>>>>I checked one of the tests designed for epilogue vectorization using >>>>>patches 1 - 3 and found out that build compiler performs vectorization >>>>>of epilogues with --param vect-epilogues-nomask=1 passed: >>>>> >>>>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>>>>t1.new-nomask.s -fdump-tree-vect-details >>>>>$ grep VECTORIZED -c t1.c.156t.vect >>>>>4 >>>>> Without param only 2 loops are vectorized. >>>>> >>>>>Should I simply add a part of tests related to this feature or I must >>>>>delete all not necessary changes also? >>>> >>>> Please remove all not necessary changes. >>>> >>>> Richard. >>>> >>>>>Thanks. >>>>>Yuri. >>>>> >>>>>2016-11-14 16:40 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >>>>>> >>>>>>> Richard, >>>>>>> >>>>>>> In my previous patch I forgot to remove couple lines related to aux >>>>>field. >>>>>>> Here is the correct updated patch. >>>>>> >>>>>> Yeah, I noticed. This patch would be ok for trunk (together with >>>>>> necessary parts from 1 and 2) if all not required parts are removed >>>>>> (and you'd add the testcases covering non-masked tail vect). >>>>>> >>>>>> Thus, can you please produce a single complete patch containing only >>>>>> non-masked epilogue vectoriziation? >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> Thanks. >>>>>>> Yuri. >>>>>>> >>>>>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >>>>>>> > >>>>>>> >> Richard, >>>>>>> >> >>>>>>> >> I prepare updated 3 patch with passing additional argument to >>>>>>> >> vect_analyze_loop as you proposed (untested). >>>>>>> >> >>>>>>> >> You wrote: >>>>>>> >> tw, I wonder if you can produce a single patch containing just >>>>>>> >> epilogue vectorization, that is combine patches 1-3 but rip out >>>>>>> >> changes only needed by later patches? >>>>>>> >> >>>>>>> >> Did you mean that I exclude all support for vectorization >>>>>epilogues, >>>>>>> >> i.e. exclude from 2-nd patch all non-related changes >>>>>>> >> like >>>>>>> >> >>>>>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >>>>>>> >> index 11863af..32011c1 100644 >>>>>>> >> --- a/gcc/tree-vect-loop.c >>>>>>> >> +++ b/gcc/tree-vect-loop.c >>>>>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) >>>>>>> >> LOOP_VINFO_PEELING_FOR_GAPS (res) = false; >>>>>>> >> LOOP_VINFO_PEELING_FOR_NITER (res) = false; >>>>>>> >> LOOP_VINFO_OPERANDS_SWAPPED (res) = false; >>>>>>> >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; >>>>>>> >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; >>>>>>> >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; >>>>>>> >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; >>>>>>> >> + LOOP_VINFO_NEED_MASKING (res) = false; >>>>>>> >> + LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL; >>>>>>> > >>>>>>> > Yes. >>>>>>> > >>>>>>> >> Did you mean also that new combined patch must be working patch, >>>>>i.e. >>>>>>> >> can be integrated without other patches? >>>>>>> > >>>>>>> > Yes. >>>>>>> > >>>>>>> >> Could you please look at updated patch? >>>>>>> > >>>>>>> > Will do. >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Richard. >>>>>>> > >>>>>>> >> Thanks. >>>>>>> >> Yuri. >>>>>>> >> >>>>>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguent...@suse.de>: >>>>>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote: >>>>>>> >> > >>>>>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote: >>>>>>> >> >> >>>>>>> >> >> > Richard, >>>>>>> >> >> > >>>>>>> >> >> > Here is updated 3 patch. >>>>>>> >> >> > >>>>>>> >> >> > I checked that all new tests related to epilogue >>>>>vectorization passed with it. >>>>>>> >> >> > >>>>>>> >> >> > Your comments will be appreciated. >>>>>>> >> >> >>>>>>> >> >> A lot better now. Instead of the ->aux dance I now prefer to >>>>>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as >>>>>>> >> >> optional argument (if non-NULL we analyze the epilogue of that >>>>>>> >> >> loop_vinfo). OTOH I remember we mainly use it to get at the >>>>>>> >> >> original vectorization factor? So we can pass down an >>>>>(optional) >>>>>>> >> >> forced vectorization factor as well? >>>>>>> >> > >>>>>>> >> > Btw, I wonder if you can produce a single patch containing just >>>>>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out >>>>>>> >> > changes only needed by later patches? >>>>>>> >> > >>>>>>> >> > Thanks, >>>>>>> >> > Richard. >>>>>>> >> > >>>>>>> >> >> Richard. >>>>>>> >> >> >>>>>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener >>>>><rguent...@suse.de>: >>>>>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote: >>>>>>> >> >> > > >>>>>>> >> >> > >> Hi Richard, >>>>>>> >> >> > >> >>>>>>> >> >> > >> I did not understand your last remark: >>>>>>> >> >> > >> >>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>>>>>> >> >> > >> > && dump_enabled_p ()) >>>>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >>>>>vect_location, >>>>>>> >> >> > >> > "loop vectorized\n"); >>>>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>>>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>>>>>> >> >> > >> > num_vectorized_loops++; >>>>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >>>>>it to be unrolled >>>>>>> >> >> > >> > etc. */ >>>>>>> >> >> > >> > loop->force_vectorize = false; >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >>>>>it easier >>>>>>> >> >> > >> > + to match loop and its epilogue vectorization >>>>>in dumps >>>>>>> >> >> > >> > + put new loop as the next loop to process. >>>>>*/ >>>>>>> >> >> > >> > + if (new_loop) >>>>>>> >> >> > >> > + { >>>>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >>>>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >>>>>>> >> >> > >> > + } >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >>>>>new_loop) >>>>>>> >> >> > >> f> unction which will set up stuff properly (and also >>>>>perform >>>>>>> >> >> > >> > the if-conversion of the epilogue there). >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue >>>>>vectorization >>>>>>> >> >> > >> > separately that would be great. >>>>>>> >> >> > >> >>>>>>> >> >> > >> Could you please clarify your proposal. >>>>>>> >> >> > > >>>>>>> >> >> > > When a loop was vectorized set things up to immediately >>>>>vectorize >>>>>>> >> >> > > its epilogue, avoiding changing the loop iteration and >>>>>avoiding >>>>>>> >> >> > > the re-use of ->aux. >>>>>>> >> >> > > >>>>>>> >> >> > > Richard. >>>>>>> >> >> > > >>>>>>> >> >> > >> Thanks. >>>>>>> >> >> > >> Yuri. >>>>>>> >> >> > >> >>>>>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener >>>>><rguent...@suse.de>: >>>>>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: >>>>>>> >> >> > >> > >>>>>>> >> >> > >> >> Hi All, >>>>>>> >> >> > >> >> >>>>>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review >>>>>which support >>>>>>> >> >> > >> >> vectorization of loop epilogues and loops with low >>>>>trip count. We >>>>>>> >> >> > >> >> assume that the only patch - >>>>>vec-tails-07-combine-tail.patch - was not >>>>>>> >> >> > >> >> approved by Jeff. >>>>>>> >> >> > >> >> >>>>>>> >> >> > >> >> I did re-base of all patches and performed >>>>>bootstrapping and >>>>>>> >> >> > >> >> regression testing that did not show any new failures. >>>>>Also all >>>>>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have >>>>>been changed >>>>>>> >> >> > >> >> accordingly. >>>>>>> >> >> > >> >> >>>>>>> >> >> > >> >> Is it OK for trunk? >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > I would have prefered that the series up to >>>>>-03-nomask-tails would >>>>>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but >>>>>unfortunately >>>>>>> >> >> > >> > the patchset is oddly separated. >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > I have a comment on that part nevertheless: >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment >>>>>(loop_vec_info >>>>>>> >> >> > >> > loop_vinfo) >>>>>>> >> >> > >> > /* Check if we can possibly peel the loop. */ >>>>>>> >> >> > >> > if (!vect_can_advance_ivs_p (loop_vinfo) >>>>>>> >> >> > >> > || !slpeel_can_duplicate_loop_p (loop, >>>>>single_exit (loop)) >>>>>>> >> >> > >> > - || loop->inner) >>>>>>> >> >> > >> > + || loop->inner >>>>>>> >> >> > >> > + /* Required peeling was performed in prologue >>>>>and >>>>>>> >> >> > >> > + is not required for epilogue. */ >>>>>>> >> >> > >> > + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) >>>>>>> >> >> > >> > do_peeling = false; >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > if (do_peeling >>>>>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment >>>>>(loop_vec_info >>>>>>> >> >> > >> > loop_vinfo) >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > do_versioning = >>>>>>> >> >> > >> > optimize_loop_nest_for_speed_p (loop) >>>>>>> >> >> > >> > - && (!loop->inner); /* FORNOW */ >>>>>>> >> >> > >> > + && (!loop->inner) /* FORNOW */ >>>>>>> >> >> > >> > + /* Required versioning was performed for the >>>>>>> >> >> > >> > + original loop and is not required for >>>>>epilogue. */ >>>>>>> >> >> > >> > + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > if (do_versioning) >>>>>>> >> >> > >> > { >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > please do that check in the single caller of this >>>>>function. >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I >>>>>believe that simply >>>>>>> >> >> > >> > passing down info from the processed parent would be >>>>>_much_ cleaner. >>>>>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change): >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void) >>>>>>> >> >> > >> > && dump_enabled_p ()) >>>>>>> >> >> > >> > dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, >>>>>vect_location, >>>>>>> >> >> > >> > "loop vectorized\n"); >>>>>>> >> >> > >> > - vect_transform_loop (loop_vinfo); >>>>>>> >> >> > >> > + new_loop = vect_transform_loop (loop_vinfo); >>>>>>> >> >> > >> > num_vectorized_loops++; >>>>>>> >> >> > >> > /* Now that the loop has been vectorized, allow >>>>>it to be unrolled >>>>>>> >> >> > >> > etc. */ >>>>>>> >> >> > >> > loop->force_vectorize = false; >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > + /* Add new loop to a processing queue. To make >>>>>it easier >>>>>>> >> >> > >> > + to match loop and its epilogue vectorization >>>>>in dumps >>>>>>> >> >> > >> > + put new loop as the next loop to process. >>>>>*/ >>>>>>> >> >> > >> > + if (new_loop) >>>>>>> >> >> > >> > + { >>>>>>> >> >> > >> > + loops.safe_insert (i + 1, new_loop->num); >>>>>>> >> >> > >> > + vect_loops_num = number_of_loops (cfun); >>>>>>> >> >> > >> > + } >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo, >>>>>new_loop) >>>>>>> >> >> > >> > function which will set up stuff properly (and also >>>>>perform >>>>>>> >> >> > >> > the if-conversion of the epilogue there). >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > That said, if we can get in non-masked epilogue >>>>>vectorization >>>>>>> >> >> > >> > separately that would be great. >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > I'm still torn about all the rest of the stuff and >>>>>question its >>>>>>> >> >> > >> > usability (esp. merging the epilogue with the main >>>>>vector loop). >>>>>>> >> >> > >> > But it has already been approved ... oh well. >>>>>>> >> >> > >> > >>>>>>> >> >> > >> > Thanks, >>>>>>> >> >> > >> > Richard. >>>>>>> >> >> > >> >>>>>>> >> >> > >> >>>>>>> >> >> > > >>>>>>> >> >> > > -- >>>>>>> >> >> > > Richard Biener <rguent...@suse.de> >>>>>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, >>>>>Graham Norton, HRB 21284 (AG Nuernberg) >>>>>>> >> >> > >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> > >>>>>>> >> > -- >>>>>>> >> > Richard Biener <rguent...@suse.de> >>>>>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>>Norton, HRB 21284 (AG Nuernberg) >>>>>>> >> >>>>>>> > >>>>>>> > -- >>>>>>> > Richard Biener <rguent...@suse.de> >>>>>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>>Norton, HRB 21284 (AG Nuernberg) >>>>>>> >>>>>> >>>>>> -- >>>>>> Richard Biener <rguent...@suse.de> >>>>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham >>>>>Norton, HRB 21284 (AG Nuernberg) >>>> >>>>