Richard Biener <richard.guent...@gmail.com> writes: > On Thu, Oct 19, 2017 at 12:28 AM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> Normally we adjust the vector loop so that it iterates: >>>> >>>> (original number of scalar iterations - number of peels) / VF >>>> >>>> times, enforcing this using an IV that starts at zero and increments >>>> by one each iteration. However, dividing by VF would be expensive >>>> for variable VF, so this patch adds an alternative in which the IV >>>> increments by VF each iteration instead. We then need to take care >>>> to handle possible overflow in the IV. >>> >>> Hmm, why do you need to handle possible overflow? Doesn't the >>> original loop have a natural IV that evolves like this? After all we >>> can compute an expression for niters of the scalar loop. >> >> The problem comes with loops like: >> >> unsigned char i = 0; >> do >> { >> ... >> i--; >> } >> while (i != 0); >> >> The loop statements execute 256 times and the latch executes 255 times. >> LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an >> unsigned char) is 0. > > Yes, that's an existing issue and the reason why I introduced > NITERSM1. All remaining uses of NITERS should really go away > because of this corner-case. So you are introducing a new user?
It's not really an NITERSM1 vs. NITERS thing. We'd get the same result/have the same problem with NITERSM1 - (STEP - 1) instead of NITERS - STEP, namely: - the new IV uses the same type as NITERS - we only want the loop to iterate if there are at least STEP scalar iterations to go - this means that the natural limit is "IV <= NITERS - STEP" or "IV <= NITERSM1 - (STEP - 1)" (both equivalent) - the loop is only guaranteed to terminate if the IV can hit a value STEP times higher than that, i.e. "IV == NITERS - STEP" must be followed by an iteration in which the branch-back condition is false - but if NITERS can't represent the actual number of iterations, then there is no value STEP times higher than that - we cope with this by starting the IV at -1 and using a limit of "IV < NITERS - STEP" i.e. "IV <= NITERSM1 - STEP". So you could see this as using a limit based on NITERSM1 with a start of -1, although the "< NITERS - STEP" avoids the need to subtract 1 at runtime. But it seems better to use a 0-based IV when we can, since that leads to more natural ivopts opportunities. That's why the loop tests for the overflow case and only uses the -1 based IV when necessary. Thanks, Richard > > Richard. > >> This leads to things like: >> >> /* Constant case. */ >> if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) >> { >> tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo); >> tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo); >> >> gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST); >> gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST); >> if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters)) >> return true; >> } >> >> in loop_niters_no_overflow. >> >>>> The new mechanism isn't used yet; a later patch replaces the >>>> "if (1)" with a check for variable VF. If the patch is OK, I'll >>>> hold off applying it until the follow-on is ready to go in. >>> >>> I indeed don't like code that isn't exercised. Otherwise looks reasonable. >> >> Thanks. >> >> Richard >> >>> Thanks, >>> Richard. >>> >>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu. >>>> OK to install when the time comes? >>>> >>>> Richard >>>> >>>> >>>> 2017-10-13 Richard Sandiford <richard.sandif...@linaro.org> >>>> >>>> gcc/ >>>> * tree-vect-loop-manip.c: Include gimple-fold.h. >>>> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and >>>> niters_maybe_zero parameters. Handle other cases besides a step of > 1. >>>> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter. >>>> Add a path that uses a step of VF instead of 1, but disable it >>>> for now. >>>> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var >>>> and niters_no_overflow parameters. Update calls to >>>> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters. >>>> Create a new SSA name if the latter choses to use a ste other >>>> than zero, and return it via niters_vector_mult_vf_var. >>>> * tree-vect-loop.c (vect_transform_loop): Update calls to >>>> vect_do_peeling, vect_gen_vector_loop_niters and >>>> slpeel_make_loop_iterate_ntimes. >>>> * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, > vect_do_peeling) >>>> (vect_gen_vector_loop_niters): Update declarations after above >>> changes. >>>> >>>> Index: gcc/tree-vect-loop-manip.c >>>> =================================================================== >>>> --- gcc/tree-vect-loop-manip.c 2017-10-13 15:01:40.144777367 +0100 >>>> +++ gcc/tree-vect-loop-manip.c 2017-10-13 15:01:40.296014347 +0100 >>>> @@ -41,6 +41,7 @@ Software Foundation; either version 3, o >>>> #include "tree-scalar-evolution.h" >>>> #include "tree-vectorizer.h" >>>> #include "tree-ssa-loop-ivopts.h" >>>> +#include "gimple-fold.h" >>>> >>>> /************************************************************************* >>>> Simple Loop Peeling Utilities >>>> @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda >>>> gimple_bb (update_phi)); >>>> } >>>> >>>> -/* Make the LOOP iterate NITERS times. This is done by adding a new IV >>>> - that starts at zero, increases by one and its limit is NITERS. >>>> +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times, >>>> + where NITERS is known to be outside the range [1, STEP - 1]. >>>> + This is equivalent to making the loop execute NITERS / STEP >>>> + times when NITERS is nonzero and (1 << M) / STEP times otherwise, >>>> + where M is the precision of NITERS. >>>> + >>>> + NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known >>>> + to be >= STEP. In the latter case N is always NITERS / STEP. >>>> + >>>> + If FINAL_IV is nonnull, it is an SSA name that should be set to >>>> + N * STEP on exit from the loop. >>>> >>>> Assumption: the exit-condition of LOOP is the last stmt in the loop. >>>> */ >>>> >>>> void >>>> -slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters) >>>> +slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree >>>> step, >>>> + tree final_iv, bool niters_maybe_zero) >>>> { >>>> tree indx_before_incr, indx_after_incr; >>>> gcond *cond_stmt; >>>> gcond *orig_cond; >>>> + edge pe = loop_preheader_edge (loop); >>>> edge exit_edge = single_exit (loop); >>>> gimple_stmt_iterator loop_cond_gsi; >>>> gimple_stmt_iterator incr_gsi; >>>> bool insert_after; >>>> - tree init = build_int_cst (TREE_TYPE (niters), 0); >>>> - tree step = build_int_cst (TREE_TYPE (niters), 1); >>>> source_location loop_loc; >>>> enum tree_code code; >>>> + tree niters_type = TREE_TYPE (niters); >>>> >>>> orig_cond = get_loop_exit_condition (loop); >>>> gcc_assert (orig_cond); >>>> loop_cond_gsi = gsi_for_stmt (orig_cond); >>>> >>>> + tree init, limit; >>>> + if (!niters_maybe_zero && integer_onep (step)) >>>> + { >>>> + /* In this case we can use a simple 0-based IV: >>>> + >>>> + A: >>>> + x = 0; >>>> + do >>>> + { >>>> + ... >>>> + x += 1; >>>> + } >>>> + while (x < NITERS); */ >>>> + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; >>>> + init = build_zero_cst (niters_type); >>>> + limit = niters; >>>> + } >>>> + else >>>> + { >>>> + /* The following works for all values of NITERS except 0: >>>> + >>>> + B: >>>> + x = 0; >>>> + do >>>> + { >>>> + ... >>>> + x += STEP; >>>> + } >>>> + while (x <= NITERS - STEP); >>>> + >>>> + so that the loop continues to iterate if x + STEP - 1 < NITERS >>>> + but stops if x + STEP - 1 >= NITERS. >>>> + >>>> + However, if NITERS is zero, x never hits a value above NITERS - > STEP >>>> + before wrapping around. There are two obvious ways of dealing >>>> with >>>> + this: >>>> + >>>> + - start at STEP - 1 and compare x before incrementing it >>>> + - start at -1 and compare x after incrementing it >>>> + >>>> + The latter is simpler and is what we use. The loop in this case >>>> + looks like: >>>> + >>>> + C: >>>> + x = -1; >>>> + do >>>> + { >>>> + ... >>>> + x += STEP; >>>> + } >>>> + while (x < NITERS - STEP); >>>> + >>>> + In both cases the loop limit is NITERS - STEP. */ >>>> + gimple_seq seq = NULL; >>>> + limit = force_gimple_operand (niters, &seq, true, NULL_TREE); >>>> + limit = gimple_build (&seq, MINUS_EXPR, TREE_TYPE (limit), limit, >>> step); >>>> + if (seq) >>>> + { >>>> + basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); >>>> + gcc_assert (!new_bb); >>>> + } >>>> + if (niters_maybe_zero) >>>> + { >>>> + /* Case C. */ >>>> + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; >>>> + init = build_all_ones_cst (niters_type); >>>> + } >>>> + else >>>> + { >>>> + /* Case B. */ >>>> + code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GT_EXPR : LE_EXPR; >>>> + init = build_zero_cst (niters_type); >>>> + } >>>> + } >>>> + >>>> standard_iv_increment_position (loop, &incr_gsi, &insert_after); >>>> create_iv (init, step, NULL_TREE, loop, >>>> &incr_gsi, insert_after, &indx_before_incr, >>>> &indx_after_incr); >>>> @@ -278,11 +364,10 @@ slpeel_make_loop_iterate_ntimes (struct >>>> indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, >>> indx_after_incr, >>>> true, NULL_TREE, true, >>>> GSI_SAME_STMT); >>>> - niters = force_gimple_operand_gsi (&loop_cond_gsi, niters, true, > NULL_TREE, >>>> + limit = force_gimple_operand_gsi (&loop_cond_gsi, limit, true, >>>> NULL_TREE, >>>> true, GSI_SAME_STMT); >>>> >>>> - code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR; >>>> - cond_stmt = gimple_build_cond (code, indx_after_incr, niters, NULL_TREE, >>>> + cond_stmt = gimple_build_cond (code, indx_after_incr, limit, NULL_TREE, >>>> NULL_TREE); >>>> >>>> gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); >>>> @@ -301,8 +386,23 @@ slpeel_make_loop_iterate_ntimes (struct >>>> } >>>> >>>> /* Record the number of latch iterations. */ >>>> - loop->nb_iterations = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), > niters, >>>> - build_int_cst (TREE_TYPE (niters), >>>> 1)); >>>> + if (limit == niters) >>>> + /* Case A: the loop iterates NITERS times. Subtract one to get the >>>> + latch count. */ >>>> + loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters, >>>> + build_int_cst (niters_type, 1)); >>>> + else >>>> + /* Case B or C: the loop iterates (NITERS - STEP) / STEP + 1 times. >>>> + Subtract one from this to get the latch count. */ >>>> + loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type, >>>> + limit, step); >>>> + >>>> + if (final_iv) >>>> + { >>>> + gassign *assign = gimple_build_assign (final_iv, MINUS_EXPR, >>>> + indx_after_incr, init); >>>> + gsi_insert_on_edge_immediate (single_exit (loop), assign); >>>> + } >>>> } >>>> >>>> /* Helper routine of slpeel_tree_duplicate_loop_to_edge_cfg. >>>> @@ -1170,23 +1270,32 @@ vect_gen_scalar_loop_niters (tree niters >>>> return niters; >>>> } >>>> >>>> -/* This function generates the following statements: >>>> +/* NITERS is the number of times that the original scalar loop executes >>>> + after peeling. Work out the maximum number of iterations N that can >>>> + be handled by the vectorized form of the loop and then either: >>>> + >>>> + a) set *STEP_VECTOR_PTR to the vectorization factor and generate: >>>> + >>>> + niters_vector = N >>>> + >>>> + b) set *STEP_VECTOR_PTR to one and generate: >>>> >>>> - niters = number of iterations loop executes (after peeling) >>>> - niters_vector = niters / vf >>>> + niters_vector = N / vf >>>> >>>> - and places them on the loop preheader edge. NITERS_NO_OVERFLOW is >>>> - true if NITERS doesn't overflow. */ >>>> + In both cases, store niters_vector in *NITERS_VECTOR_PTR and add >>>> + any new statements on the loop preheader edge. NITERS_NO_OVERFLOW >>>> + is true if NITERS doesn't overflow (i.e. if NITERS is always > nonzero). */ >>>> >>>> void >>>> vect_gen_vector_loop_niters (loop_vec_info loop_vinfo, tree niters, >>>> - tree *niters_vector_ptr, bool niters_no_overflow) >>>> + tree *niters_vector_ptr, tree >>>> *step_vector_ptr, >>>> + bool niters_no_overflow) >>>> { >>>> tree ni_minus_gap, var; >>>> - tree niters_vector, type = TREE_TYPE (niters); >>>> + tree niters_vector, step_vector, type = TREE_TYPE (niters); >>>> int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); >>>> edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo)); >>>> - tree log_vf = build_int_cst (type, exact_log2 (vf)); >>>> + tree log_vf = NULL_TREE; >>>> >>>> /* If epilogue loop is required because of data accesses with gaps, we >>>> subtract one iteration from the total number of iterations here for >>>> @@ -1207,21 +1316,32 @@ vect_gen_vector_loop_niters (loop_vec_in >>>> else >>>> ni_minus_gap = niters; >>>> >>>> - /* Create: niters >> log2(vf) */ >>>> - /* If it's known that niters == number of latch executions + 1 doesn't >>>> - overflow, we can generate niters >> log2(vf); otherwise we generate >>>> - (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio >>>> - will be at least one. */ >>>> - if (niters_no_overflow) >>>> - niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, log_vf); >>>> + if (1) >>>> + { >>>> + /* Create: niters >> log2(vf) */ >>>> + /* If it's known that niters == number of latch executions + 1 > doesn't >>>> + overflow, we can generate niters >> log2(vf); otherwise we >>>> generate >>>> + (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio >>>> + will be at least one. */ >>>> + log_vf = build_int_cst (type, exact_log2 (vf)); >>>> + if (niters_no_overflow) >>>> + niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, > log_vf); >>>> + else >>>> + niters_vector >>>> + = fold_build2 (PLUS_EXPR, type, >>>> + fold_build2 (RSHIFT_EXPR, type, >>>> + fold_build2 (MINUS_EXPR, type, >>>> + ni_minus_gap, >>>> + build_int_cst (type, vf)), >>>> + log_vf), >>>> + build_int_cst (type, 1)); >>>> + step_vector = build_one_cst (type); >>>> + } >>>> else >>>> - niters_vector >>>> - = fold_build2 (PLUS_EXPR, type, >>>> - fold_build2 (RSHIFT_EXPR, type, >>>> - fold_build2 (MINUS_EXPR, type, ni_minus_gap, >>>> - build_int_cst (type, vf)), >>>> - log_vf), >>>> - build_int_cst (type, 1)); >>>> + { >>>> + niters_vector = ni_minus_gap; >>>> + step_vector = build_int_cst (type, vf); >>>> + } >>>> >>>> if (!is_gimple_val (niters_vector)) >>>> { >>>> @@ -1231,7 +1351,7 @@ vect_gen_vector_loop_niters (loop_vec_in >>>> gsi_insert_seq_on_edge_immediate (pe, stmts); >>>> /* Peeling algorithm guarantees that vector loop bound is at least > ONE, >>>> we set range information to make niters analyzer's life easier. >>>> */ >>>> - if (stmts != NULL) >>>> + if (stmts != NULL && log_vf) >>>> set_range_info (niters_vector, VR_RANGE, >>>> wi::to_wide (build_int_cst (type, 1)), >>>> wi::to_wide (fold_build2 (RSHIFT_EXPR, type, >>>> @@ -1239,6 +1359,7 @@ vect_gen_vector_loop_niters (loop_vec_in >>>> log_vf))); >>>> } >>>> *niters_vector_ptr = niters_vector; >>>> + *step_vector_ptr = step_vector; >>>> >>>> return; >>>> } >>>> @@ -1600,7 +1721,12 @@ slpeel_update_phi_nodes_for_lcssa (struc >>>> - TH, CHECK_PROFITABILITY: Threshold of niters to vectorize loop if >>>> CHECK_PROFITABILITY is true. >>>> Output: >>>> - - NITERS_VECTOR: The number of iterations of loop after vectorization. >>>> + - *NITERS_VECTOR and *STEP_VECTOR describe how the main loop should >>>> + iterate after vectorization; see slpeel_make_loop_iterate_ntimes >>>> + for details. >>>> + - *NITERS_VECTOR_MULT_VF_VAR is either null or an SSA name that >>>> + should be set to the number of scalar iterations handled by the >>>> + vector loop. The SSA name is only used on exit from the loop. >>>> >>>> This function peels prolog and epilog from the loop, adds guards > skipping >>>> PROLOG and EPILOG for various conditions. As a result, the changed CFG >>>> @@ -1657,8 +1783,9 @@ slpeel_update_phi_nodes_for_lcssa (struc >>>> >>>> struct loop * >>>> vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, >>>> - tree *niters_vector, int th, bool check_profitability, >>>> - bool niters_no_overflow) >>>> + tree *niters_vector, tree *step_vector, >>>> + tree *niters_vector_mult_vf_var, int th, >>>> + bool check_profitability, bool niters_no_overflow) >>>> { >>>> edge e, guard_e; >>>> tree type = TREE_TYPE (niters), guard_cond; >>>> @@ -1754,7 +1881,9 @@ vect_do_peeling (loop_vec_info loop_vinf >>>> /* Generate and update the number of iterations for prolog loop. */ >>>> niters_prolog = vect_gen_prolog_loop_niters (loop_vinfo, anchor, >>>> &bound_prolog); >>>> - slpeel_make_loop_iterate_ntimes (prolog, niters_prolog); >>>> + tree step_prolog = build_one_cst (TREE_TYPE (niters_prolog)); >>>> + slpeel_make_loop_iterate_ntimes (prolog, niters_prolog, step_prolog, >>>> + NULL_TREE, false); >>>> >>>> /* Skip the prolog loop. */ >>>> if (skip_prolog) >>>> @@ -1867,9 +1996,20 @@ vect_do_peeling (loop_vec_info loop_vinf >>>> overflows. */ >>>> niters_no_overflow |= (prolog_peeling > 0); >>>> vect_gen_vector_loop_niters (loop_vinfo, niters, >>>> - niters_vector, niters_no_overflow); >>>> - vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector, >>>> - &niters_vector_mult_vf); >>>> + niters_vector, step_vector, >>>> + niters_no_overflow); >>>> + if (!integer_onep (*step_vector)) >>>> + { >>>> + /* On exit from the loop we will have an easy way of calcalating >>>> + NITERS_VECTOR / STEP * STEP. Install a dummy definition >>>> + until then. */ >>>> + niters_vector_mult_vf = make_ssa_name (TREE_TYPE > (*niters_vector)); >>>> + SSA_NAME_DEF_STMT (niters_vector_mult_vf) = gimple_build_nop (); >>>> + *niters_vector_mult_vf_var = niters_vector_mult_vf; >>>> + } >>>> + else >>>> + vect_gen_vector_loop_niters_mult_vf (loop_vinfo, *niters_vector, >>>> + &niters_vector_mult_vf); >>>> /* Update IVs of original loop as if they were advanced by >>>> niters_vector_mult_vf steps. */ >>>> gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo)); >>>> Index: gcc/tree-vect-loop.c >>>> =================================================================== >>>> --- gcc/tree-vect-loop.c 2017-10-13 15:01:40.144777367 +0100 >>>> +++ gcc/tree-vect-loop.c 2017-10-13 15:01:40.296014347 +0100 >>>> @@ -7273,7 +7273,9 @@ vect_transform_loop (loop_vec_info loop_ >>>> basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); >>>> int nbbs = loop->num_nodes; >>>> int i; >>>> - tree niters_vector = NULL; >>>> + tree niters_vector = NULL_TREE; >>>> + tree step_vector = NULL_TREE; >>>> + tree niters_vector_mult_vf = NULL_TREE; >>>> int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); >>>> bool grouped_store; >>>> bool slp_scheduled = false; >>>> @@ -7342,17 +7344,21 @@ vect_transform_loop (loop_vec_info loop_ >>>> LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = niters; >>>> tree nitersm1 = unshare_expr (LOOP_VINFO_NITERSM1 (loop_vinfo)); >>>> bool niters_no_overflow = loop_niters_no_overflow (loop_vinfo); >>>> - epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, >>> &niters_vector, th, >>>> + epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, >>>> &niters_vector, >>>> + &step_vector, &niters_vector_mult_vf, th, >>>> check_profitability, niters_no_overflow); >>>> if (niters_vector == NULL_TREE) >>>> { >>>> if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) >>>> - niters_vector >>>> - = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)), >>>> - LOOP_VINFO_INT_NITERS (loop_vinfo) / vf); >>>> + { >>>> + niters_vector >>>> + = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)), >>>> + LOOP_VINFO_INT_NITERS (loop_vinfo) / vf); >>>> + step_vector = build_one_cst (TREE_TYPE (niters)); >>>> + } >>>> else >>>> vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector, >>>> - niters_no_overflow); >>>> + &step_vector, niters_no_overflow); >>>> } >>>> >>>> /* 1) Make sure the loop header has exactly two entries >>>> @@ -7603,7 +7609,13 @@ vect_transform_loop (loop_vec_info loop_ >>>> } /* stmts in BB */ >>>> } /* BBs in loop */ >>>> >>>> - slpeel_make_loop_iterate_ntimes (loop, niters_vector); >>>> + /* The vectorization factor is always > 1, so if we use an IV >>> increment of 1. >>>> + a zero NITERS becomes a nonzero NITERS_VECTOR. */ >>>> + if (integer_onep (step_vector)) >>>> + niters_no_overflow = true; >>>> + slpeel_make_loop_iterate_ntimes (loop, niters_vector, step_vector, >>>> + niters_vector_mult_vf, >>>> + !niters_no_overflow); >>>> >>>> scale_profile_for_vect_loop (loop, vf); >>>> >>>> Index: gcc/tree-vectorizer.h >>>> =================================================================== >>>> --- gcc/tree-vectorizer.h 2017-10-13 15:01:40.144777367 +0100 >>>> +++ gcc/tree-vectorizer.h 2017-10-13 15:01:40.296014347 +0100 >>>> @@ -1138,13 +1138,14 @@ vect_get_scalar_dr_size (struct data_ref >>>> >>>> /* Simple loop peeling and versioning utilities for vectorizer's purposes >>>> - >>>> in tree-vect-loop-manip.c. */ >>>> -extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); >>>> +extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree, tree, >>>> + tree, bool); >>>> extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); >>>> struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *, >>>> struct loop *, edge); >>>> extern void vect_loop_versioning (loop_vec_info, unsigned int, bool); >>>> extern struct loop *vect_do_peeling (loop_vec_info, tree, tree, >>>> - tree *, int, bool, bool); >>>> + tree *, tree *, tree *, int, bool, bool); >>>> extern source_location find_loop_location (struct loop *); >>>> extern bool vect_can_advance_ivs_p (loop_vec_info); >>>> >>>> @@ -1258,7 +1259,8 @@ extern gimple *vect_force_simple_reducti >>>> /* Drive for loop analysis stage. */ >>>> extern loop_vec_info vect_analyze_loop (struct loop *, loop_vec_info); >>>> extern tree vect_build_loop_niters (loop_vec_info, bool * = NULL); >>>> -extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree > *, bool); >>>> +extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *, >>>> + tree *, bool); >>>> /* Drive for loop transformation stage. */ >>>> extern struct loop *vect_transform_loop (loop_vec_info); >>>> extern loop_vec_info vect_analyze_loop_form (struct loop *);