On 12/11/2015 10:53, "Richard Biener" <richard.guent...@gmail.com> wrote:
>On Wed, Nov 11, 2015 at 7:54 PM, Alan Hayward <alan.hayw...@arm.com> >wrote: >> >> >> On 11/11/2015 13:25, "Richard Biener" <richard.guent...@gmail.com> >>wrote: >> >>>On Wed, Nov 11, 2015 at 1:22 PM, Alan Hayward <alan.hayw...@arm.com> >>>wrote: >>>> Hi, >>>> I hoped to post this in time for Monday’s cut off date, but >>>>circumstances >>>> delayed me until today. Hoping if possible this patch will still be >>>>able >>>> to go in. >>>> >>>> >>>> This patch builds upon the change for PR65947, and reduces the amount >>>>of >>>> code produced in a vectorized condition reduction where operand 2 of >>>>the >>>> COND_EXPR is an assignment of a increasing integer induction variable >>>>that >>>> won't wrap. >>>> >>>> >>>> For example (assuming all types are ints), this is a match: >>>> >>>> last = 5; >>>> for (i = 0; i < N; i++) >>>> if (a[i] < min_v) >>>> last = i; >>>> >>>> Whereas, this is not because the result is based off a memory access: >>>> last = 5; >>>> for (i = 0; i < N; i++) >>>> if (a[i] < min_v) >>>> last = a[i]; >>>> >>>> In the integer induction variable case we can just use a MAX reduction >>>>and >>>> skip all the code I added in my vectorized condition reduction patch - >>>>the >>>> additional induction variables in vectorizable_reduction () and the >>>> additional checks in vect_create_epilog_for_reduction (). From the >>>>patch >>>> diff only, it's not immediately obvious that those parts will be >>>>skipped >>>> as there is no code changes in those areas. >>>> >>>> The initial value of the induction variable is force set to zero, as >>>>any >>>> other value could effect the result of the induction. At the end of >>>>the >>>> loop, if the result is zero, then we restore the original initial >>>>value. >>> >>>+static bool >>>+is_integer_induction (gimple *stmt, struct loop *loop) >>> >>>is_nonwrapping_integer_induction? >>> >>>+ tree lhs_max = TYPE_MAX_VALUE (TREE_TYPE (gimple_phi_result (stmt))); >>> >>>don't use TYPE_MAX_VALUE. >>> >>>+ /* Check that the induction increments. */ >>>+ if (tree_int_cst_compare (step, size_zero_node) <= 0) >>>+ return false; >>> >>>tree_int_cst_sgn (step) == -1 >>> >>>+ /* Check that the max size of the loop will not wrap. */ >>>+ >>>+ if (! max_loop_iterations (loop, &ni)) >>>+ return false; >>>+ /* Convert backedges to iterations. */ >>>+ ni += 1; >>> >>>just use max_stmt_executions (loop, &ni) which properly checks for >>>overflow >>>of the +1. >>> >>>+ max_loop_value = wi::add (wi::to_widest (base), >>>+ wi::mul (wi::to_widest (step), ni)); >>>+ >>>+ if (wi::gtu_p (max_loop_value, wi::to_widest (lhs_max))) >>>+ return false; >>> >>>you miss a check for the wi::add / wi::mul to overflow. You can use >>>extra args to determine this. >>> >>>Instead of TYPE_MAX_VALUE use wi::max_value (precision, sign). >>> >>>I wonder if you want to skip all the overflow checks for >>>TYPE_OVERFLOW_UNDEFINED >>>IV types? >>> >> >> Ok with all the above. >> >> Tried using max_value () but this gave me a wide_int instead of a >> widest_int. >> Instead I've replaced with min_precision and GET_MODE_BITSIZE. >> >> Added an extra check for when the type is TYPE_OVERFLOW_UNDEFINED. > >+ /* Set the loop-entry arg of the reduction-phi. */ >+ >+ if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) >+ == INTEGER_INDUC_COND_REDUCTION) > >extra vertical space > >+ tree zero = build_int_cst ( TREE_TYPE (vec_init_def_type), >0); >+ tree zero_vec = build_vector_from_val (vec_init_def_type, >zero); >+ > >build_zero_cst (vec_init_def_type); > >+ else >+ { >+ add_phi_arg (as_a <gphi *> (phi), vec_init_def, > loop_preheader_edge (loop), UNKNOWN_LOCATION); >+ } > >no {}s around single stmts > >+ tree comparez = build2 (EQ_EXPR, boolean_type_node, new_temp, >zero); > >please no l33t speech Ha! That wasn't the intention, it was short for "compare zero". Changed to "zcompare" becuase I didn't want to use "compare". > >+ tmp = build3 (COND_EXPR, scalar_type, comparez, initial_def, >+ new_temp); >+ >+ epilog_stmt = gimple_build_assign (new_scalar_dest, tmp); >+ new_temp = make_ssa_name (new_scalar_dest, epilog_stmt); >+ gimple_assign_set_lhs (epilog_stmt, new_temp); > >epilog_stmt = gimple_build_assign (make_ssa_name (new_scalar_dest), > COND_EXPR, >compare, initial_def, new_temp); > I had to pull the make_ssa_name out to a variable, as it's required for a scalar_results.safe_push () a few lines down. > >+ /* Check that the max size of the loop will not wrap. */ >+ >+ if (TYPE_OVERFLOW_UNDEFINED (lhs_type)) >+ { >+ return (GET_MODE_BITSIZE (TYPE_MODE (lhs_type)) >+ >= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (base)))); > >this mode check will always be true as lhs_type and base are from the >same PHI node. > >+ return (wi::min_precision (max_loop_value, TYPE_SIGN (lhs_type)) >+ <= GET_MODE_BITSIZE (TYPE_MODE (lhs_type))); > >please use TYPE_PRECISION (lhs_type) instead. > >Ok with those changes. > Submitted to head with changes as requested. Alan.
optimizeConditionReductions3.patch
Description: Binary data