Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

Richard Biener via Gcc-patches Tue, 21 Sep 2021 23:14:33 -0700

On Tue, 21 Sep 2021, Andre Vieira (lists) wrote:

> Hi Richi,
> 
> Thanks for the review, see below some questions.
> 
> On 21/09/2021 13:30, Richard Biener wrote:
> > On Fri, 17 Sep 2021, Andre Vieira (lists) wrote:
> >
> >> Hi all,
> >>
> >> This patch adds the ability to define a target hook to unroll the main
> >> vectorized loop. It also introduces --param's vect-unroll and
> >> vect-unroll-reductions to control this through a command-line. I found this
> >> useful to experiment and believe can help when tuning, so I decided to
> >> leave
> >> it in.
> >> We only unroll the main loop and have disabled unrolling epilogues for now.
> >> We
> >> also do not support unrolling of any loop that has a negative step and we
> >> do
> >> not support unrolling a loop with any reduction other than a
> >> TREE_CODE_REDUCTION.
> >>
> >> Bootstrapped and regression tested on aarch64-linux-gnu as part of the
> >> series.
> > I wonder why we want to change the vector modes used for the epilogue,
> > we're either making it more likely to need to fall through to the
> > scalar epilogue or require another vectorized epilogue.
> I don't quite understand what you mean by change the vector modes for the
> epilogue. I don't think we do.
> If you are referring to:
>       /* If we are unrolling, try all VECTOR_MODES for the epilogue.  */
>       if (loop_vinfo->par_unrolling_factor > 1)
>         {
>           next_vector_mode = vector_modes[0];
>           mode_i = 1;
> 
>           if (dump_enabled_p ())
>         dump_printf_loc (MSG_NOTE, vect_location,
>                  "***** Re-trying analysis with vector mode"
>                  " %s for epilogue with partial vectors.\n",
>                  GET_MODE_NAME (next_vector_mode));
>           continue;
>         }
> 
> That just forces trying the vector modes we've tried before. Though I might
> need to revisit this now I think about it. I'm afraid it might be possible for
> this to generate an epilogue with a vf that is not lower than that of the main
> loop, but I'd need to think about this again.
> 
> Either way I don't think this changes the vector modes used for the epilogue.
> But maybe I'm just missing your point here.


Yes, I was refering to the above which suggests that when we vectorize
the main loop with V4SF but unroll then we try vectorizing the
epilogue with V4SF as well (but not unrolled).  I think that's
premature (not sure if you try V8SF if the main loop was V4SF but
unrolled 4 times).

> > That said, for simplicity I'd only change the VF of the main loop.
> >
> > There I wonder why you need to change vect_update_vf_for_slp or
> > vect_determine_partial_vectors_and_peeling and why it's not enough
> > to adjust the VF in a single place, I'd do that here:
> >
> >    /* This is the point where we can re-start analysis with SLP forced off.
> > */
> > start_over:
> >
> >    /* Now the vectorization factor is final.  */
> >    poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> >    gcc_assert (known_ne (vectorization_factor, 0U));
> >
> > ---->  call vect_update_vf_for_unroll ()
> I can move it there, it would indeed remove the need for the change to
> vect_update_vf_for_slp, the change to
> vect_determine_partial_vectors_and_peeling would still be required I think. It
> is meant to disable using partial vectors in an unrolled loop.

Why would we disable the use of partial vectors in an unrolled loop?

> > note there's also loop->unroll (from #pragma GCC unroll) which we
> > could include in what you look at in vect_unroll_value.
> >
> > I don't like add_stmt_cost_for_unroll - how should a target go
> > and decide based on what it is fed?  You could as well feed it
> > the scalar body or the vinfo so it can get a shot at all
> > the vectorizers meta data - but feeding it individual stmt_infos
> > does not add any meaningful abstraction and thus what's the
> > point?
> I am still working on tuning our backend hook, but the way it works is it
> estimates how many load, store and general ops are required for the vectorized
> loop based on these.
> > I _think_ what would make some sense is when we actually cost
> > the vector body (with the not unrolled VF) ask the target
> > "well, how about unrolling this?" because there it has the
> > chance to look at the actual vector stmts produced (in "cost form").
> > And if the target answers "yeah - go ahead and try x4" we signal
> > that to the iteration and have "mode N with x4 unroll" validated and
> > costed.
> >
> > So instead of a new target hook amend the finish_cost hook to
> > produce a suggested unroll value and cost both the unrolled and
> > not unrolled body.
> >
> > Sorry for steering in a different direction ;)
> The reason we decided to do this early and not after cost is because
> 'vect_prune_runtime_alias_test_list' and 'vect_enhance_data_refs_alignment'
> require the VF and if you suddenly raise that the alias analysis could become
> invalid.

Sure but I'm suggesting you keep the not unrolled body as one way of
costed vectorization but then if the target says "try unrolling"
re-do the analysis with the same mode but a larger VF.  Just like
we iterate over vector modes you'll now iterate over pairs of
vector mode + VF (unroll factor).  It's not about re-using the costing
it's about using costing that is actually relevant and also to avoid
targets inventing two distinct separate costings - a target (powerpc)
might already compute load/store density and other stuff for the main
costing so it should have an idea whether doubling or triplicating is OK.

Richard.

> An initial implementation did do it later for that very reason that we could
> reuse the cost calculations and AArch64 already computed these 'ops' after
> Richard Sandiford's patches.
> But yeah ... the above kinda led me to rewrite it this way.
> 
> >
> > Thanks,
> > Richard.
> >
> >
> >
> >> gcc/ChangeLog:
> >>
> >>          * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR
> >>          and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
> >>          * doc/tm.texi.in: Add entries for target hooks above.
> >>          * params.opt: Add vect-unroll and vect-unroll-reductions
> >> parameters.
> >>          * target.def: Define hooks TARGET_VECTORIZE_UNROLL_FACTOR
> >>          and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
> >>          * targhooks.c (default_add_stmt_cost_for_unroll): New.
> >>          (default_unroll_factor): Likewise.
> >>          * targhooks.h (default_add_stmt_cost_for_unroll): Likewise.
> >>          (default_unroll_factor): Likewise.
> >>          * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> >>          par_unrolling_factor.
> >>          (vect_update_vf_for_slp): Use unrolling factor to update
> >> vectorization
> >>          factor.
> >>          (vect_determine_partial_vectors_and_peeling): Account for
> >> unrolling.
> >>         (vect_determine_unroll_factor): Determine how much to unroll
> >> vectorized
> >>          main loop.
> >>          (vect_analyze_loop_2): Call vect_determine_unroll_factor.
> >>          (vect_analyze_loop): Allow for epilogue vectorization when
> >> unrolling
> >>          and rewalk vector_mode array for the epilogues.
> >>          (vectorizable_reduction): Disable single_defuse_cycle when
> >> unrolling.
> >>          * tree-vectorizer.h (vect_unroll_value): Declare
> >>  par_unrolling_factor
> >>          as a member of loop_vec_info.
> >>
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

Reply via email to