> On Tue, Feb 14, 2017 at 2:13 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> > On Tue, Feb 14, 2017 at 1:57 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
> >>> Thanks,
> >>> bin
> >>> 2017-02-13  Bin Cheng  <bin.ch...@arm.com>
> >>>
> >>>       PR tree-optimization/79347
> >>>       * tree-vect-loop-manip.c (apply_probability_for_bb): New function.
> >>>       (vect_do_peeling): Maintain profile counters during peeling.
> >>>
> >>> gcc/testsuite/ChangeLog
> >>> 2017-02-13  Bin Cheng  <bin.ch...@arm.com>
> >>>
> >>>       PR tree-optimization/79347
> >>>       * gcc.dg/vect/pr79347.c: New test.
> >>
> >>> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c 
> >>> b/gcc/testsuite/gcc.dg/vect/pr79347.c
> >>> new file mode 100644
> >>> index 0000000..586c638
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c
> >>> @@ -0,0 +1,13 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-require-effective-target vect_int } */
> >>> +/* { dg-additional-options "-fdump-tree-vect-all" } */
> >>> +
> >>> +short *a;
> >>> +int c;
> >>> +void n(void)
> >>> +{
> >>> +  for (int i = 0; i<c;i++)
> >>> +    a[i]++;
> >>> +}
> >>
> >> Thanks for fixing the prologue.  I think there is still one extra problem 
> >> in the vectorizer.
> >> With the internal vectorized loop I now see:
> >>
> >> ;;   basic block 9, loop depth 1, count 0, freq 956, maybe hot
> >> ;;   Invalid sum of incoming frequencies 1961, should be 956
> >> ;;    prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
> >> ;;    pred:       10 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
> >> ;;                8 [100.0%]  (FALLTHRU)
> >>   # i_18 = PHI <i_14(10), i_38(8)>
> >>   # vectp_a.13_66 = PHI <vectp_a.13_67(10), vectp_a.14_64(8)>
> >>   # vectp_a.19_75 = PHI <vectp_a.19_76(10), vectp_a.20_73(8)>
> >>   # ivtmp_78 = PHI <ivtmp_79(10), 0(8)>
> >>   _2 = (long unsigned int) i_18;
> >>   _3 = _2 * 2;
> >>   _4 = a.0_1 + _3;
> >>   vect__5.15_68 = MEM[(short int *)vectp_a.13_66];
> >>   _5 = *_4;
> >>   vect__6.16_69 = VIEW_CONVERT_EXPR<vector(8) unsigned 
> >> short>(vect__5.15_68);
> >>   _6 = (unsigned short) _5;
> >>   vect__7.17_71 = vect__6.16_69 + vect_cst__70;
> >>   _7 = _6 + 1;
> >>   vect__8.18_72 = VIEW_CONVERT_EXPR<vector(8) short int>(vect__7.17_71);
> >>   _8 = (short int) _7;
> >>   MEM[(short int *)vectp_a.19_75] = vect__8.18_72;
> >>   i_14 = i_18 + 1;
> >>   vectp_a.13_67 = vectp_a.13_66 + 16;
> >>   vectp_a.19_76 = vectp_a.19_75 + 16;
> >>   ivtmp_79 = ivtmp_78 + 1;
> >>   if (ivtmp_79 < bnd.10_59)
> >>     goto <bb 10>; [85.00%]
> >>   else
> >>     goto <bb 11>; [15.00%]
> >>
> >> So it seems that the frequency of the loop itself is unrealistically 
> >> scaled down.
> >> Before vetorizing the frequency is 8500 and predicted number of iterations 
> >> is
> >> 6.6.  Now the loop is intereed via BB 8 with frequency 1148, so the loop, 
> >> by
> >> exit probability exits with 15% probability and thus still has 6.6 
> >> iterations,
> >> but by BB frequencies its body executes fewer times than the preheader.
> >>
> >> Now this is a fragile area vectirizing loop should scale number of 
> >> iterations down
> >> 8 times. However guessed CFG profiles are always very "flat". Of course
> >> if loop iterated 6.6 times at the average vectorizing would not make any 
> >> sense.
> >> Making guessed profiles less flat is unrealistic, because average loop 
> >> iterates few times,
> >> but of course while vectorizing we make additional guess that the 
> >> vectorizable loops
> >> matters and the guessed profile is probably unrealistic.
> > That's what I mentioned in the original patch.  Vectorizer calls
> > scale_loop_profile in
> > function vect_transform_loop to scale down loop's frequency regardless 
> > mismatch
> > between loop and preheader/exit basic blocks.  In fact, after this
> > patch all mismatches
> > in vectorizer are introduced by this.  I don't see any way to keep
> > consistency beween
> > vectorized loop and the rest program without visiting whole CFG.  So
> > shall we skip
> > scaling down profile counters for vectorized loop?
> >
> >>
> >> GCC 6 seems however bit more consistent.
> >>> +/* Apply probability PROB to basic block BB and its single succ edge.  */
> >>> +
> >>> +static void
> >>> +apply_probability_for_bb (basic_block bb, int prob)
> >>> +{
> >>> +  bb->frequency = apply_probability (bb->frequency, prob);
> >>> +  bb->count = apply_probability (bb->count, prob);
> >>> +  gcc_assert (single_succ_p (bb));
> >>> +  single_succ_edge (bb)->count = bb->count;
> >>> +}
> >>> +
> >>>  /* Function vect_do_peeling.
> >>>
> >>>     Input:
> >>> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> >>> niters, tree nitersm1,
> >>>       may be preferred.  */
> >>>    basic_block anchor = loop_preheader_edge (loop)->src;
> >>>    if (skip_vector)
> >>> -    split_edge (loop_preheader_edge (loop));
> >>> +    {
> >>> +      split_edge (loop_preheader_edge (loop));
> >>> +
> >>> +      /* Due to the order in which we peel prolog and epilog, we first
> >>> +      propagate probability to the whole loop.  The purpose is to
> >>> +      avoid adjusting probabilities of both prolog and vector loops
> >>> +      separately.  Note in this case, the probability of epilog loop
> >>> +      needs to be scaled back later.  */
> >>> +      basic_block bb_before_loop = loop_preheader_edge (loop)->src;
> >>> +      apply_probability_for_bb (bb_before_loop, prob_vector);
> >> Aha, this is the bit I missed while trying to fix it myself.
> >> I scale_bbs_frequencies_int(&bb_before_loop, 1, prob_vector, 
> >> REG_BR_PROB_BASE)
> >> to do this.  I plan to revamp API for this next stage1, but lets keep this 
> >> consistent.
> > Will change that.
> >
> >> Path is OK with this change and ...
> >>> +      scale_loop_profile (loop, prob_vector, bound);
> > Can't do this with current interface because I need to scale up
> > frequency after skip_vector,
> > and sub-calls in scale_loop_profile checks validity for prob and only
> > allows scaling down...
> 
> Hi,
> Attachment is the updated patch.  Bootstrap and tested.  Is it OK?
> 
> Thanks,
> bin
> 
> 2017-02-13  Bin Cheng  <bin.ch...@arm.com>
> 
>     PR tree-optimization/79347
>     * tree-vect-loop-manip.c (vect_do_peeling): Maintain profile
>     counters during peeling.
> 
> gcc/testsuite/ChangeLog
> 2017-02-13  Bin Cheng  <bin.ch...@arm.com>
> 
>     PR tree-optimization/79347
>     * gcc.dg/vect/pr79347.c: New test.

OK, thanks!

Honza

Reply via email to