On Thu, 10 Oct 2019, Andre Vieira (lists) wrote: > Hi, > > After all the discussions and respins I now believe this patch is close to > what we envisioned. > > This patch achieves two things when vect-epilogues-nomask=1: > 1) It analyzes the original loop for each supported vector size and saves this > analysis per loop, as well as the vector sizes we know we can vectorize the > loop for. > 2) When loop versioning it uses the 'skip_vector' code path to vectorize the > epilogue, and uses the lowest versioning threshold between the main and > epilogue's. > > As side effects of this patch I also changed ensure_base_align to only update > the alignment if the new alignment is lower than the current one. This > function already did that if the object was a symbol, now it behaves this way > for any object. > > I bootstrapped this patch with both vect-epilogues-nomask turned on and off on > x86_64 (AVX512) and aarch64. Regression tests looked good. > > Is this OK for trunk?
+ + /* Keep track of vector sizes we know we can vectorize the epilogue with. */ + vector_sizes epilogue_vsizes; }; please don't enlarge struct loop, instead track this somewhere in the vectorizer (in loop_vinfo? I see you already have epilogue_vinfos there - so the loop_vinfo simply lacks convenient access to the vector_size?) I don't see any use that could be trivially adjusted to look at a loop_vinfo member instead. For the vect_update_inits_of_drs this means that we'd possibly do less CSE. Not sure if really an issue. You use LOOP_VINFO_EPILOGUE_P sometimes and sometimes LOOP_VINFO_ORIG_LOOP_INFO, please change predicates to LOOP_VINFO_EPILOGUE_P. @@ -2466,15 +2461,62 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, else niters_prolog = build_int_cst (type, 0); + loop_vec_info epilogue_vinfo = NULL; + if (vect_epilogues) + { ... + vect_epilogues = false; + } + I don't understand what all this does - it clearly needs a comment. Maybe the overall comment of the function should be amended with an overview of how we handle [multiple] epilogue loop vectorization? + + if (epilogue_any_upper_bound && prolog_peeling >= 0) + { + epilog->any_upper_bound = true; + epilog->nb_iterations_upper_bound = eiters + 1; + } + comment missing. How can prolog_peeling be < 0? We likely didn't set the upper bound because we don't know it in the case we skipped the vector loop (skip_vector)? So make sure to not introduce wrong-code issues here - maybe do this optimization as followup? class loop * -vect_loop_versioning (loop_vec_info loop_vinfo, - unsigned int th, bool check_profitability, - poly_uint64 versioning_threshold) +vect_loop_versioning (loop_vec_info loop_vinfo) { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo), *nloop; class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); @@ -2988,10 +3151,15 @@ vect_loop_versioning (loop_vec_info loop_vinfo, bool version_align = LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo); bool version_alias = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo); bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo); + poly_uint64 versioning_threshold + = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo); tree version_simd_if_cond = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo); + unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); - if (check_profitability) + if (th >= vect_vf_for_cost (loop_vinfo) + && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && !ordered_p (th, versioning_threshold)) cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters, build_int_cst (TREE_TYPE (scalar_loop_iters), th - 1)); split out this refactoring - preapproved. @@ -1726,7 +1729,13 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo) return 0; } - HOST_WIDE_INT estimated_niter = estimated_stmt_executions_int (loop); + HOST_WIDE_INT estimated_niter = -1; + + if (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) + estimated_niter + = vect_vf_for_cost (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) - 1; + if (estimated_niter == -1) + estimated_niter = estimated_stmt_executions_int (loop); if (estimated_niter == -1) estimated_niter = likely_max_stmt_executions_int (loop); if (estimated_niter != -1 it's clearer if the old code is completely in a else {} path even though vect_vf_for_cost - 1 should never be -1. +/* Decides whether we need to create an epilogue loop to handle + remaining scalar iterations and sets PEELING_FOR_NITERS accordingly. */ + +void +determine_peel_for_niter (loop_vec_info loop_vinfo) +{ + extra vertical space + unsigned HOST_WIDE_INT const_vf; + HOST_WIDE_INT max_niter if it's a 1:1 copy outlined then split it out - preapproved (so further reviews get smaller patches ;)) I'd add a LOOP_VINFO_PEELING_FOR_NITER () = false as final else since that's what we do by default? - if (LOOP_REQUIRES_VERSIONING (loop_vinfo)) + if (LOOP_REQUIRES_VERSIONING (loop_vinfo) + || ((orig_loop_vinfo = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)) + && LOOP_REQUIRES_VERSIONING (orig_loop_vinfo))) not sure why we need to do this for epilouges? + + /* Use the same condition as vect_transform_loop to decide when to use + the cost to determine a versioning threshold. */ + if (th >= vect_vf_for_cost (loop_vinfo) + && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && ordered_p (th, niters_th)) + niters_th = ordered_max (poly_uint64 (th), niters_th); that's an independent change, right? Please split out, it's pre-approved if it tests OK separately. +static tree +replace_ops (tree op, hash_map<tree, tree> &mapping) +{ I'm quite sure I've seen such beast elsewhere ;) simplify_replace_tree comes up first (not a 1:1 match but hints at a possible tree sharing issue in your variant). @@ -8497,11 +8588,11 @@ vect_transform_loop (loop_vec_info loop_vinfo) if (th >= vect_vf_for_cost (loop_vinfo) && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "Profitability threshold is %d loop iterations.\n", - th); - check_profitability = true; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Profitability threshold is %d loop iterations.\n", + th); + check_profitability = true; } /* Make sure there exists a single-predecessor exit bb. Do this before obvious (separate) + tree advance; epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector, &step_vector, &niters_vector_mult_vf, th, - check_profitability, niters_no_overflow); + check_profitability, niters_no_overflow, + &advance); + + if (epilogue) + { + basic_block *orig_bbs = get_loop_body (loop); + loop_vec_info epilogue_vinfo = loop_vec_info_for_loop (epilogue); ... please record this in vect_do_peeling itself and store the orig_stmts/drs/etc. in the epilogue loop_vinfo and ... + /* We are done vectorizing the main loop, so now we update the epilogues + stmt_vec_info's. At the same time we set the gimple UID of each + statement in the epilogue, as these are used to look them up in the + epilogues loop_vec_info later. We also keep track of what ... split this out to a new function. I wonder why you need to record the DRs, are they not available via ->datarefs and lookup_dr ()? diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 601a6f55fbff388c89f88d994e790aebf2bf960e..201549da6c0cbae0797a23ae1b8967b9895505e9 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -6288,7 +6288,7 @@ ensure_base_align (dr_vec_info *dr_info) if (decl_in_symtab_p (base_decl)) symtab_node::get (base_decl)->increase_alignment (align_base_to); - else + else if (DECL_ALIGN (base_decl) < align_base_to) { SET_DECL_ALIGN (base_decl, align_base_to); DECL_USER_ALIGN (base_decl) = 1; split out - preapproved. Still have to go over the main loop doing the analysis/transform. Thanks, it looks really promising (albeit exepectedly ugly due to the data rewriting). Richard. > gcc/ChangeLog: > 2019-10-10 Andre Vieira <andre.simoesdiasvie...@arm.com> > > PR 88915 > * cfgloop.h (loop): Add epilogue_vsizes member. > * cfgloop.c (flow_loop_free): Release epilogue_vsizes. > (alloc_loop): Initialize epilogue_vsizes. > * gentype.c (main): Add poly_uint64 type and vector_sizes to > generator. > * tree-vect-loop.c (vect_get_loop_niters): Make externally visible. > (_loop_vec_info): Initialize epilogue_vinfos. > (~_loop_vec_info): Release epilogue_vinfos. > (vect_analyze_loop_costing): Use knowledge of main VF to estimate > number of iterations of epilogue. > (determine_peel_for_niter): New. Outlined code to re-use in two > places. > (vect_analyze_loop_2): Adapt to analyse main loop for all supported > vector sizes when vect-epilogues-nomask=1. Also keep track of lowest > versioning threshold needed for main loop. > (vect_analyze_loop): Likewise. > (replace_ops): New helper function. > (vect_transform_loop): When vectorizing epilogues re-use analysis done > on main loop and update necessary information. > * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert > stmts on loop preheader edge. > (vect_do_peeling): Enable skip-vectors when doing loop versioning if > we decided to vectorize epilogues. Update epilogues NITERS and > construct ADVANCE to update epilogues data references where needed. > (vect_loop_versioning): Moved decision to check_profitability > based on cost model. > * tree-vect-stmts.c (ensure_base_align): Only update alignment > if new alignment is lower. > * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos member. > (vect_loop_versioning, vect_do_peeling, vect_get_loop_niters, > vect_update_inits_of_drs, determine_peel_for_niter, > vect_analyze_loop): Add or update declarations. > * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already > create loop_vec_info's for epilogues when available. Otherwise analyse > epilogue separately. > > > > Cheers, > Andre > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 247165 (AG München)