> On Tue, Feb 14, 2017 at 2:13 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > > On Tue, Feb 14, 2017 at 1:57 PM, Jan Hubicka <hubi...@ucw.cz> wrote: > >>> Thanks, > >>> bin > >>> 2017-02-13 Bin Cheng <bin.ch...@arm.com> > >>> > >>> PR tree-optimization/79347 > >>> * tree-vect-loop-manip.c (apply_probability_for_bb): New function. > >>> (vect_do_peeling): Maintain profile counters during peeling. > >>> > >>> gcc/testsuite/ChangeLog > >>> 2017-02-13 Bin Cheng <bin.ch...@arm.com> > >>> > >>> PR tree-optimization/79347 > >>> * gcc.dg/vect/pr79347.c: New test. > >> > >>> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c > >>> b/gcc/testsuite/gcc.dg/vect/pr79347.c > >>> new file mode 100644 > >>> index 0000000..586c638 > >>> --- /dev/null > >>> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c > >>> @@ -0,0 +1,13 @@ > >>> +/* { dg-do compile } */ > >>> +/* { dg-require-effective-target vect_int } */ > >>> +/* { dg-additional-options "-fdump-tree-vect-all" } */ > >>> + > >>> +short *a; > >>> +int c; > >>> +void n(void) > >>> +{ > >>> + for (int i = 0; i<c;i++) > >>> + a[i]++; > >>> +} > >> > >> Thanks for fixing the prologue. I think there is still one extra problem > >> in the vectorizer. > >> With the internal vectorized loop I now see: > >> > >> ;; basic block 9, loop depth 1, count 0, freq 956, maybe hot > >> ;; Invalid sum of incoming frequencies 1961, should be 956 > >> ;; prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED) > >> ;; pred: 10 [100.0%] (FALLTHRU,DFS_BACK,EXECUTABLE) > >> ;; 8 [100.0%] (FALLTHRU) > >> # i_18 = PHI <i_14(10), i_38(8)> > >> # vectp_a.13_66 = PHI <vectp_a.13_67(10), vectp_a.14_64(8)> > >> # vectp_a.19_75 = PHI <vectp_a.19_76(10), vectp_a.20_73(8)> > >> # ivtmp_78 = PHI <ivtmp_79(10), 0(8)> > >> _2 = (long unsigned int) i_18; > >> _3 = _2 * 2; > >> _4 = a.0_1 + _3; > >> vect__5.15_68 = MEM[(short int *)vectp_a.13_66]; > >> _5 = *_4; > >> vect__6.16_69 = VIEW_CONVERT_EXPR<vector(8) unsigned > >> short>(vect__5.15_68); > >> _6 = (unsigned short) _5; > >> vect__7.17_71 = vect__6.16_69 + vect_cst__70; > >> _7 = _6 + 1; > >> vect__8.18_72 = VIEW_CONVERT_EXPR<vector(8) short int>(vect__7.17_71); > >> _8 = (short int) _7; > >> MEM[(short int *)vectp_a.19_75] = vect__8.18_72; > >> i_14 = i_18 + 1; > >> vectp_a.13_67 = vectp_a.13_66 + 16; > >> vectp_a.19_76 = vectp_a.19_75 + 16; > >> ivtmp_79 = ivtmp_78 + 1; > >> if (ivtmp_79 < bnd.10_59) > >> goto <bb 10>; [85.00%] > >> else > >> goto <bb 11>; [15.00%] > >> > >> So it seems that the frequency of the loop itself is unrealistically > >> scaled down. > >> Before vetorizing the frequency is 8500 and predicted number of iterations > >> is > >> 6.6. Now the loop is intereed via BB 8 with frequency 1148, so the loop, > >> by > >> exit probability exits with 15% probability and thus still has 6.6 > >> iterations, > >> but by BB frequencies its body executes fewer times than the preheader. > >> > >> Now this is a fragile area vectirizing loop should scale number of > >> iterations down > >> 8 times. However guessed CFG profiles are always very "flat". Of course > >> if loop iterated 6.6 times at the average vectorizing would not make any > >> sense. > >> Making guessed profiles less flat is unrealistic, because average loop > >> iterates few times, > >> but of course while vectorizing we make additional guess that the > >> vectorizable loops > >> matters and the guessed profile is probably unrealistic. > > That's what I mentioned in the original patch. Vectorizer calls > > scale_loop_profile in > > function vect_transform_loop to scale down loop's frequency regardless > > mismatch > > between loop and preheader/exit basic blocks. In fact, after this > > patch all mismatches > > in vectorizer are introduced by this. I don't see any way to keep > > consistency beween > > vectorized loop and the rest program without visiting whole CFG. So > > shall we skip > > scaling down profile counters for vectorized loop? > > > >> > >> GCC 6 seems however bit more consistent. > >>> +/* Apply probability PROB to basic block BB and its single succ edge. */ > >>> + > >>> +static void > >>> +apply_probability_for_bb (basic_block bb, int prob) > >>> +{ > >>> + bb->frequency = apply_probability (bb->frequency, prob); > >>> + bb->count = apply_probability (bb->count, prob); > >>> + gcc_assert (single_succ_p (bb)); > >>> + single_succ_edge (bb)->count = bb->count; > >>> +} > >>> + > >>> /* Function vect_do_peeling. > >>> > >>> Input: > >>> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree > >>> niters, tree nitersm1, > >>> may be preferred. */ > >>> basic_block anchor = loop_preheader_edge (loop)->src; > >>> if (skip_vector) > >>> - split_edge (loop_preheader_edge (loop)); > >>> + { > >>> + split_edge (loop_preheader_edge (loop)); > >>> + > >>> + /* Due to the order in which we peel prolog and epilog, we first > >>> + propagate probability to the whole loop. The purpose is to > >>> + avoid adjusting probabilities of both prolog and vector loops > >>> + separately. Note in this case, the probability of epilog loop > >>> + needs to be scaled back later. */ > >>> + basic_block bb_before_loop = loop_preheader_edge (loop)->src; > >>> + apply_probability_for_bb (bb_before_loop, prob_vector); > >> Aha, this is the bit I missed while trying to fix it myself. > >> I scale_bbs_frequencies_int(&bb_before_loop, 1, prob_vector, > >> REG_BR_PROB_BASE) > >> to do this. I plan to revamp API for this next stage1, but lets keep this > >> consistent. > > Will change that. > > > >> Path is OK with this change and ... > >>> + scale_loop_profile (loop, prob_vector, bound); > > Can't do this with current interface because I need to scale up > > frequency after skip_vector, > > and sub-calls in scale_loop_profile checks validity for prob and only > > allows scaling down... > > Hi, > Attachment is the updated patch. Bootstrap and tested. Is it OK? > > Thanks, > bin > > 2017-02-13 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/79347 > * tree-vect-loop-manip.c (vect_do_peeling): Maintain profile > counters during peeling. > > gcc/testsuite/ChangeLog > 2017-02-13 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/79347 > * gcc.dg/vect/pr79347.c: New test.
OK, thanks! Honza