On Thu, Oct 19, 2017 at 10:48 AM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Thu, Oct 19, 2017 at 12:28 AM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> Richard Biener <richard.guent...@gmail.com> writes: >>>> On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford >>>> <richard.sandif...@linaro.org> wrote: >>>>> Normally we adjust the vector loop so that it iterates: >>>>> >>>>> (original number of scalar iterations - number of peels) / VF >>>>> >>>>> times, enforcing this using an IV that starts at zero and increments >>>>> by one each iteration. However, dividing by VF would be expensive >>>>> for variable VF, so this patch adds an alternative in which the IV >>>>> increments by VF each iteration instead. We then need to take care >>>>> to handle possible overflow in the IV. >>>> >>>> Hmm, why do you need to handle possible overflow? Doesn't the >>>> original loop have a natural IV that evolves like this? After all we >>>> can compute an expression for niters of the scalar loop. >>> >>> The problem comes with loops like: >>> >>> unsigned char i = 0; >>> do >>> { >>> ... >>> i--; >>> } >>> while (i != 0); >>> >>> The loop statements execute 256 times and the latch executes 255 times. >>> LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an >>> unsigned char) is 0. >> >> Yes, that's an existing issue and the reason why I introduced >> NITERSM1. All remaining uses of NITERS should really go away >> because of this corner-case. So you are introducing a new user? > > It's not really an NITERSM1 vs. NITERS thing. We'd get the same > result/have the same problem with NITERSM1 - (STEP - 1) instead > of NITERS - STEP, namely: > > - the new IV uses the same type as NITERS > - we only want the loop to iterate if there are at least STEP scalar > iterations to go > - this means that the natural limit is "IV <= NITERS - STEP" > or "IV <= NITERSM1 - (STEP - 1)" (both equivalent) > - the loop is only guaranteed to terminate if the IV can hit > a value STEP times higher than that, i.e. "IV == NITERS - STEP" > must be followed by an iteration in which the branch-back > condition is false > - but if NITERS can't represent the actual number of iterations, > then there is no value STEP times higher than that > - we cope with this by starting the IV at -1 and using a limit > of "IV < NITERS - STEP" i.e. "IV <= NITERSM1 - STEP". > > So you could see this as using a limit based on NITERSM1 with a > start of -1, although the "< NITERS - STEP" avoids the need to > subtract 1 at runtime. > > But it seems better to use a 0-based IV when we can, since that > leads to more natural ivopts opportunities. That's why the loop > tests for the overflow case and only uses the -1 based IV when > necessary.
I see. Thanks for the clarification. Richard. > Thanks, > Richard > >> >> Richard. >> >>> This leads to things like: >>> >>> /* Constant case. */ >>> if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) >>> { >>> tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo); >>> tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo); >>> >>> gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST); >>> gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST); >>> if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters)) >>> return true; >>> } >>> >>> in loop_niters_no_overflow. >>> >>>>> The new mechanism isn't used yet; a later patch replaces the >>>>> "if (1)" with a check for variable VF. If the patch is OK, I'll >>>>> hold off applying it until the follow-on is ready to go in. >>>> >>>> I indeed don't like code that isn't exercised. Otherwise looks reasonable. >>> >>> Thanks. >>> >>> Richard >>> >>>> Thanks, >>>> Richard. >>>> >>>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. >>>>> OK to install when the time comes? >>>>> >>>>> Richard >>>>> >>>>> >>>>> 2017-10-13 Richard Sandiford <richard.sandif...@linaro.org> >>>>> >>>>> gcc/ >>>>> * tree-vect-loop-manip.c: Include gimple-fold.h. >>>>> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and >>>>> niters_maybe_zero parameters. Handle other cases besides a step of >> 1. >>>>> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter. >>>>> Add a path that uses a step of VF instead of 1, but disable it >>>>> for now. >>>>> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var >>>>> and niters_no_overflow parameters. Update calls to >>>>> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters. >>>>> Create a new SSA name if the latter choses to use a ste other >>>>> than zero, and return it via niters_vector_mult_vf_var. >>>>> * tree-vect-loop.c (vect_transform_loop): Update calls to >>>>> vect_do_peeling, vect_gen_vector_loop_niters and >>>>> slpeel_make_loop_iterate_ntimes. >>>>> * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, >> vect_do_peeling) >>>>> (vect_gen_vector_loop_niters): Update declarations after above >>>> changes. >>>>> >>>>> Index: gcc/tree-vect-loop-manip.c >>>>> =================================================================== >>>>> --- gcc/tree-vect-loop-manip.c 2017-10-13 15:01:40.144777367 +0100 >>>>> +++ gcc/tree-vect-loop-manip.c 2017-10-13 15:01:40.296014347 +0100 >>>>> @@ -41,6 +41,7 @@ Software Foundation; either version 3, o >>>>> #include "tree-scalar-evolution.h" >>>>> #include "tree-vectorizer.h" >>>>> #include "tree-ssa-loop-ivopts.h" >>>>> +#include "gimple-fold.h" >>>>> >>>>> >>>>> /************************************************************************* >>>>> Simple Loop Peeling Utilities >>>>> @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda >>>>> gimple_bb (update_phi)); >>>>> } >>>>> >>>>> -/* Make the LOOP iterate NITERS times. This is done by adding a new IV >>>>> - that starts at zero, increases by one and its limit is NITERS. >>>>> +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times, >>>>> + where NITERS is known to be outside the range [1, STEP - 1]. >>>>> + This is equivalent to making the loop execute NITERS / STEP >>>>> + times when NITERS is nonzero and (1 << M) / STEP times otherwise, >>>>> + where M is the precision of NITERS. >>>>> + >>>>> + NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known >>>>> + to be >= STEP. In the latter case N is always NITERS / STEP. >>>>> + >>>>> + If FINAL_IV is nonnull, it is an SSA name that should be set to >>>>> + N * STEP on exit from the loop. >>>>> >>>>> Assumption: the exit-condition of LOOP is the last stmt in the loop. >>>>> */ >>>>> >>>>> void >>>>> -slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters) >>>>> +slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree >>>>> step, >>>>> + tree final_iv, bool niters_maybe_zero) >>>>> { >>>>> tree indx_before_incr, indx_after_incr; >>>>> gcond *cond_stmt; >>>>> gcond *orig_cond; >>>>> + edge pe = loop_preheader_edge (loop); >>>>> edge exit_edge = single_exit (loop); >>>>> gimple_stmt_iterator loop_cond_gsi; >>>>> gimple_stmt_iterator incr_gsi; >>>>> bool insert_after; >>>>> - tree init = build_int_cst (TREE_TYPE (niters), 0); >>>>> - tree step = build_int_cst (TREE_TYPE (niters), 1); >>>>> source_location loop_loc; >>>>> enum tree_code code; >>>>> + tree niters_type = TREE_TYPE (niters); >>>>> >>>>> orig_cond = get_loop_exit_condition (loop); >>>>> gcc_assert (orig_cond); >>>>> loop_cond_gsi = gsi_for_stmt (orig_cond); >>>>> >>>>> + tree init, limit; >>>>> + if (!niters_maybe_zero && integer_onep (step)) >>>>> + { >>>>> + /* In this case we can use a simple 0-based IV: >>>>> + >>>>> + A: >>>>> + x = 0; >>>>> + do >>>>> + { >>>>> + ... >>>>> + x += 1; >>>>> + } >>>>> + while (x < NITERS); */ >>>>> + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; >>>>> + init = build_zero_cst (niters_type); >>>>> + limit = niters; >>>>> + } >>>>> + else >>>>> + { >>>>> + /* The following works for all values of NITERS except 0: >>>>> + >>>>> + B: >>>>> + x = 0; >>>>> + do >>>>> + { >>>>> + ... >>>>> + x += STEP; >>>>> + } >>>>> + while (x <= NITERS - STEP); >>>>> + >>>>> + so that the loop continues to iterate if x + STEP - 1 < NITERS >>>>> + but stops if x + STEP - 1 >= NITERS. >>>>> + >>>>> + However, if NITERS is zero, x never hits a value above NITERS - >> STEP >>>>> + before wrapping around. There are two obvious ways of dealing >>>>> with >>>>> + this: >>>>> + >>>>> + - start at STEP - 1 and compare x before incrementing it >>>>> + - start at -1 and compare x after incrementing it >>>>> + >>>>> + The latter is simpler and is what we use. The loop in this case >>>>> + looks like: >>>>> + >>>>> + C: >>>>> + x = -1; >>>>> + do >>>>> + { >>>>> + ... >>>>> + x += STEP; >>>>> + } >>>>> + while (x < NITERS - STEP); >>>>> + >>>>> + In both cases the loop limit is NITERS - STEP. */ >>>>> + gimple_seq seq = NULL; >>>>> + limit = force_gimple_operand (niters, &seq, true, NULL_TREE); >>>>> + limit = gimple_build (&seq, MINUS_EXPR, TREE_TYPE (limit), limit, >>>> step); >>>>> + if (seq) >>>>> + { >>>>> + basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); >>>>> + gcc_assert (!new_bb); >>>>> + } >>>>> + if (niters_maybe_zero) >>>>> + { >>>>> + /* Case C. */ >>>>> + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; >>>>> + init = build_all_ones_cst (niters_type); >>>>> + } >>>>> + else >>>>> + { >>>>> + /* Case B. */ >>>>> + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GT_EXPR : LE_EXPR; >>>>> + init = build_zero_cst (niters_type); >>>>> + } >>>>> + } >>>>> + >>>>> standard_iv_increment_position (loop, &incr_gsi, &insert_after); >>>>> create_iv (init, step, NULL_TREE, loop, >>>>> &incr_gsi, insert_after, &indx_before_incr, >>>>> &indx_after_incr); >>>>> @@ -278,11 +364,10 @@ slpeel_make_loop_iterate_ntimes (struct >>>>> indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, >>>> indx_after_incr, >>>>> true, NULL_TREE, true, >>>>> GSI_SAME_STMT); >>>>> - niters = force_gimple_operand_gsi (&loop_cond_gsi, niters, true, >> NULL_TREE, >>>>> + limit = force_gimple_operand_gsi (&loop_cond_gsi, limit, true, >>>>> NULL_TREE, >>>>> true, GSI_SAME_STMT); >>>>> >>>>> - code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; >>>>> - cond_stmt = gimple_build_cond (code, indx_after_incr, niters, >>>>> NULL_TREE, >>>>> + cond_stmt = gimple_build_cond (code, indx_after_incr, limit, NULL_TREE, >>>>> NULL_TREE); >>>>> >>>>> gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); >>>>> @@ -301,8 +386,23 @@ slpeel_make_loop_iterate_ntimes (struct >>>>> } >>>>> >>>>> /* Record the number of latch iterations. */ >>>>> - loop->nb_iterations = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), >> niters, >>>>> - build_int_cst (TREE_TYPE (niters), >>>>> 1)); >>>>> + if (limit == niters) >>>>> + /* Case A: the loop iterates NITERS times. Subtract one to get the >>>>> + latch count. */ >>>>> + loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters, >>>>> + build_int_cst (niters_type, 1)); >>>>> + else >>>>> + /* Case B or C: the loop iterates (NITERS - STEP) / STEP + 1 times. >>>>> + Subtract one from this to get the latch count. */ >>>>> + loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type, >>>>> + limit, step); >>>>> + >>>>> + if (final_iv) >>>>> + { >>>>> + gassign *assign = gimple_build_assign (final_iv, MINUS_EXPR, >>>>> + indx_after_incr, init); >>>>> + gsi_insert_on_edge_immediate (single_exit (loop), assign); >>>>> + } >>>>> } >>>>> >>>>> /* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg. >>>>> @@ -1170,23 +1270,32 @@ vect_gen_scalar_loop_niters (tree niters >>>>> return niters; >>>>> } >>>>> >>>>> -/* This function generates the following statements: >>>>> +/* NITERS is the number of times that the original scalar loop executes >>>>> + after peeling. Work out the maximum number of iterations N that can >>>>> + be handled by the vectorized form of the loop and then either: >>>>> + >>>>> + a) set *STEP_VECTOR_PTR to the vectorization factor and generate: >>>>> + >>>>> + niters_vector = N >>>>> + >>>>> + b) set *STEP_VECTOR_PTR to one and generate: >>>>> >>>>> - niters = number of iterations loop executes (after peeling) >>>>> - niters_vector = niters / vf >>>>> + niters_vector = N / vf >>>>> >>>>> - and places them on the loop preheader edge. NITERS_NO_OVERFLOW is >>>>> - true if NITERS doesn't overflow. */ >>>>> + In both cases, store niters_vector in *NITERS_VECTOR_PTR and add >>>>> + any new statements on the loop preheader edge. NITERS_NO_OVERFLOW >>>>> + is true if NITERS doesn't overflow (i.e. if NITERS is always >> nonzero). */ >>>>> >>>>> void >>>>> vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, tree niters, >>>>> - tree *niters_vector_ptr, bool niters_no_overflow) >>>>> + tree *niters_vector_ptr, tree >>>>> *step_vector_ptr, >>>>> + bool niters_no_overflow) >>>>> { >>>>> tree ni_minus_gap, var; >>>>> - tree niters_vector, type = TREE_TYPE (niters); >>>>> + tree niters_vector, step_vector, type = TREE_TYPE (niters); >>>>> int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); >>>>> edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo)); >>>>> - tree log_vf = build_int_cst (type, exact_log2 (vf)); >>>>> + tree log_vf = NULL_TREE; >>>>> >>>>> /* If epilogue loop is required because of data accesses with gaps, we >>>>> subtract one iteration from the total number of iterations here for >>>>> @@ -1207,21 +1316,32 @@ vect_gen_vector_loop_niters (loop_vec_in >>>>> else >>>>> ni_minus_gap = niters; >>>>> >>>>> - /* Create: niters >> log2(vf) */ >>>>> - /* If it's known that niters == number of latch executions + 1 doesn't >>>>> - overflow, we can generate niters >> log2(vf); otherwise we generate >>>>> - (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio >>>>> - will be at least one. */ >>>>> - if (niters_no_overflow) >>>>> - niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, >>>>> log_vf); >>>>> + if (1) >>>>> + { >>>>> + /* Create: niters >> log2(vf) */ >>>>> + /* If it's known that niters == number of latch executions + 1 >> doesn't >>>>> + overflow, we can generate niters >> log2(vf); otherwise we >>>>> generate >>>>> + (niters - vf) >> log2(vf) + 1 by using the fact that we know >>>>> ratio >>>>> + will be at least one. */ >>>>> + log_vf = build_int_cst (type, exact_log2 (vf)); >>>>> + if (niters_no_overflow) >>>>> + niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, >> log_vf); >>>>> + else >>>>> + niters_vector >>>>> + = fold_build2 (PLUS_EXPR, type, >>>>> + fold_build2 (RSHIFT_EXPR, type, >>>>> + fold_build2 (MINUS_EXPR, type, >>>>> + ni_minus_gap, >>>>> + build_int_cst (type, vf)), >>>>> + log_vf), >>>>> + build_int_cst (type, 1)); >>>>> + step_vector = build_one_cst (type); >>>>> + } >>>>> else >>>>> - niters_vector >>>>> - = fold_build2 (PLUS_EXPR, type, >>>>> - fold_build2 (RSHIFT_EXPR, type, >>>>> - fold_build2 (MINUS_EXPR, type, ni_minus_gap, >>>>> - build_int_cst (type, vf)), >>>>> - log_vf), >>>>> - build_int_cst (type, 1)); >>>>> + { >>>>> + niters_vector = ni_minus_gap; >>>>> + step_vector = build_int_cst (type, vf); >>>>> + } >>>>> >>>>> if (!is_gimple_val (niters_vector)) >>>>> { >>>>> @@ -1231,7 +1351,7 @@ vect_gen_vector_loop_niters (loop_vec_in >>>>> gsi_insert_seq_on_edge_immediate (pe, stmts); >>>>> /* Peeling algorithm guarantees that vector loop bound is at least >> ONE, >>>>> we set range information to make niters analyzer's life easier. >>>>> */ >>>>> - if (stmts != NULL) >>>>> + if (stmts != NULL && log_vf) >>>>> set_range_info (niters_vector, VR_RANGE, >>>>> wi::to_wide (build_int_cst (type, 1)), >>>>> wi::to_wide (fold_build2 (RSHIFT_EXPR, type, >>>>> @@ -1239,6 +1359,7 @@ vect_gen_vector_loop_niters (loop_vec_in >>>>> log_vf))); >>>>> } >>>>> *niters_vector_ptr = niters_vector; >>>>> + *step_vector_ptr = step_vector; >>>>> >>>>> return; >>>>> } >>>>> @@ -1600,7 +1721,12 @@ slpeel_update_phi_nodes_for_lcssa (struc >>>>> - TH, CHECK_PROFITABILITY: Threshold of niters to vectorize loop if >>>>> CHECK_PROFITABILITY is true. >>>>> Output: >>>>> - - NITERS_VECTOR: The number of iterations of loop after vectorization. >>>>> + - *NITERS_VECTOR and *STEP_VECTOR describe how the main loop should >>>>> + iterate after vectorization; see slpeel_make_loop_iterate_ntimes >>>>> + for details. >>>>> + - *NITERS_VECTOR_MULT_VF_VAR is either null or an SSA name that >>>>> + should be set to the number of scalar iterations handled by the >>>>> + vector loop. The SSA name is only used on exit from the loop. >>>>> >>>>> This function peels prolog and epilog from the loop, adds guards >> skipping >>>>> PROLOG and EPILOG for various conditions. As a result, the changed >>>>> CFG >>>>> @@ -1657,8 +1783,9 @@ slpeel_update_phi_nodes_for_lcssa (struc >>>>> >>>>> struct loop * >>>>> vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, >>>>> - tree *niters_vector, int th, bool check_profitability, >>>>> - bool niters_no_overflow) >>>>> + tree *niters_vector, tree *step_vector, >>>>> + tree *niters_vector_mult_vf_var, int th, >>>>> + bool check_profitability, bool niters_no_overflow) >>>>> { >>>>> edge e, guard_e; >>>>> tree type = TREE_TYPE (niters), guard_cond; >>>>> @@ -1754,7 +1881,9 @@ vect_do_peeling (loop_vec_info loop_vinf >>>>> /* Generate and update the number of iterations for prolog loop. >>>>> */ >>>>> niters_prolog = vect_gen_prolog_loop_niters (loop_vinfo, anchor, >>>>> &bound_prolog); >>>>> - slpeel_make_loop_iterate_ntimes (prolog, niters_prolog); >>>>> + tree step_prolog = build_one_cst (TREE_TYPE (niters_prolog)); >>>>> + slpeel_make_loop_iterate_ntimes (prolog, niters_prolog, >>>>> step_prolog, >>>>> + NULL_TREE, false); >>>>> >>>>> /* Skip the prolog loop. */ >>>>> if (skip_prolog) >>>>> @@ -1867,9 +1996,20 @@ vect_do_peeling (loop_vec_info loop_vinf >>>>> overflows. */ >>>>> niters_no_overflow |= (prolog_peeling > 0); >>>>> vect_gen_vector_loop_niters (loop_vinfo, niters, >>>>> - niters_vector, niters_no_overflow); >>>>> - vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector, >>>>> - &niters_vector_mult_vf); >>>>> + niters_vector, step_vector, >>>>> + niters_no_overflow); >>>>> + if (!integer_onep (*step_vector)) >>>>> + { >>>>> + /* On exit from the loop we will have an easy way of calcalating >>>>> + NITERS_VECTOR / STEP * STEP. Install a dummy definition >>>>> + until then. */ >>>>> + niters_vector_mult_vf = make_ssa_name (TREE_TYPE >> (*niters_vector)); >>>>> + SSA_NAME_DEF_STMT (niters_vector_mult_vf) = gimple_build_nop (); >>>>> + *niters_vector_mult_vf_var = niters_vector_mult_vf; >>>>> + } >>>>> + else >>>>> + vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector, >>>>> + &niters_vector_mult_vf); >>>>> /* Update IVs of original loop as if they were advanced by >>>>> niters_vector_mult_vf steps. */ >>>>> gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo)); >>>>> Index: gcc/tree-vect-loop.c >>>>> =================================================================== >>>>> --- gcc/tree-vect-loop.c 2017-10-13 15:01:40.144777367 +0100 >>>>> +++ gcc/tree-vect-loop.c 2017-10-13 15:01:40.296014347 +0100 >>>>> @@ -7273,7 +7273,9 @@ vect_transform_loop (loop_vec_info loop_ >>>>> basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); >>>>> int nbbs = loop->num_nodes; >>>>> int i; >>>>> - tree niters_vector = NULL; >>>>> + tree niters_vector = NULL_TREE; >>>>> + tree step_vector = NULL_TREE; >>>>> + tree niters_vector_mult_vf = NULL_TREE; >>>>> int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); >>>>> bool grouped_store; >>>>> bool slp_scheduled = false; >>>>> @@ -7342,17 +7344,21 @@ vect_transform_loop (loop_vec_info loop_ >>>>> LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = niters; >>>>> tree nitersm1 = unshare_expr (LOOP_VINFO_NITERSM1 (loop_vinfo)); >>>>> bool niters_no_overflow = loop_niters_no_overflow (loop_vinfo); >>>>> - epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, >>>> &niters_vector, th, >>>>> + epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, >>>>> &niters_vector, >>>>> + &step_vector, &niters_vector_mult_vf, th, >>>>> check_profitability, niters_no_overflow); >>>>> if (niters_vector == NULL_TREE) >>>>> { >>>>> if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) >>>>> - niters_vector >>>>> - = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)), >>>>> - LOOP_VINFO_INT_NITERS (loop_vinfo) / vf); >>>>> + { >>>>> + niters_vector >>>>> + = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)), >>>>> + LOOP_VINFO_INT_NITERS (loop_vinfo) / vf); >>>>> + step_vector = build_one_cst (TREE_TYPE (niters)); >>>>> + } >>>>> else >>>>> vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector, >>>>> - niters_no_overflow); >>>>> + &step_vector, niters_no_overflow); >>>>> } >>>>> >>>>> /* 1) Make sure the loop header has exactly two entries >>>>> @@ -7603,7 +7609,13 @@ vect_transform_loop (loop_vec_info loop_ >>>>> } /* stmts in BB */ >>>>> } /* BBs in loop */ >>>>> >>>>> - slpeel_make_loop_iterate_ntimes (loop, niters_vector); >>>>> + /* The vectorization factor is always > 1, so if we use an IV >>>> increment of 1. >>>>> + a zero NITERS becomes a nonzero NITERS_VECTOR. */ >>>>> + if (integer_onep (step_vector)) >>>>> + niters_no_overflow = true; >>>>> + slpeel_make_loop_iterate_ntimes (loop, niters_vector, step_vector, >>>>> + niters_vector_mult_vf, >>>>> + !niters_no_overflow); >>>>> >>>>> scale_profile_for_vect_loop (loop, vf); >>>>> >>>>> Index: gcc/tree-vectorizer.h >>>>> =================================================================== >>>>> --- gcc/tree-vectorizer.h 2017-10-13 15:01:40.144777367 +0100 >>>>> +++ gcc/tree-vectorizer.h 2017-10-13 15:01:40.296014347 +0100 >>>>> @@ -1138,13 +1138,14 @@ vect_get_scalar_dr_size (struct data_ref >>>>> >>>>> /* Simple loop peeling and versioning utilities for vectorizer's >>>>> purposes - >>>>> in tree-vect-loop-manip.c. */ >>>>> -extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); >>>>> +extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree, tree, >>>>> + tree, bool); >>>>> extern bool slpeel_can_duplicate_loop_p (const struct loop *, >>>>> const_edge); >>>>> struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, >>>>> struct loop *, edge); >>>>> extern void vect_loop_versioning (loop_vec_info, unsigned int, bool); >>>>> extern struct loop *vect_do_peeling (loop_vec_info, tree, tree, >>>>> - tree *, int, bool, bool); >>>>> + tree *, tree *, tree *, int, bool, bool); >>>>> extern source_location find_loop_location (struct loop *); >>>>> extern bool vect_can_advance_ivs_p (loop_vec_info); >>>>> >>>>> @@ -1258,7 +1259,8 @@ extern gimple *vect_force_simple_reducti >>>>> /* Drive for loop analysis stage. */ >>>>> extern loop_vec_info vect_analyze_loop (struct loop *, loop_vec_info); >>>>> extern tree vect_build_loop_niters (loop_vec_info, bool * = NULL); >>>>> -extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree >> *, bool); >>>>> +extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *, >>>>> + tree *, bool); >>>>> /* Drive for loop transformation stage. */ >>>>> extern struct loop *vect_transform_loop (loop_vec_info); >>>>> extern loop_vec_info vect_analyze_loop_form (struct loop *);