On 12/11/2015 10:53, "Richard Biener" <richard.guent...@gmail.com> wrote:

>On Wed, Nov 11, 2015 at 7:54 PM, Alan Hayward <alan.hayw...@arm.com>
>wrote:
>>
>>
>> On 11/11/2015 13:25, "Richard Biener" <richard.guent...@gmail.com>
>>wrote:
>>
>>>On Wed, Nov 11, 2015 at 1:22 PM, Alan Hayward <alan.hayw...@arm.com>
>>>wrote:
>>>> Hi,
>>>> I hoped to post this in time for Monday’s cut off date, but
>>>>circumstances
>>>> delayed me until today. Hoping if possible this patch will still be
>>>>able
>>>> to go in.
>>>>
>>>>
>>>> This patch builds upon the change for PR65947, and reduces the amount
>>>>of
>>>> code produced in a vectorized condition reduction where operand 2 of
>>>>the
>>>> COND_EXPR is an assignment of a increasing integer induction variable
>>>>that
>>>> won't wrap.
>>>>
>>>>
>>>> For example (assuming all types are ints), this is a match:
>>>>
>>>> last = 5;
>>>> for (i = 0; i < N; i++)
>>>>   if (a[i] < min_v)
>>>>     last = i;
>>>>
>>>> Whereas, this is not because the result is based off a memory access:
>>>> last = 5;
>>>> for (i = 0; i < N; i++)
>>>>   if (a[i] < min_v)
>>>>     last = a[i];
>>>>
>>>> In the integer induction variable case we can just use a MAX reduction
>>>>and
>>>> skip all the code I added in my vectorized condition reduction patch -
>>>>the
>>>> additional induction variables in vectorizable_reduction () and the
>>>> additional checks in vect_create_epilog_for_reduction (). From the
>>>>patch
>>>> diff only, it's not immediately obvious that those parts will be
>>>>skipped
>>>> as there is no code changes in those areas.
>>>>
>>>> The initial value of the induction variable is force set to zero, as
>>>>any
>>>> other value could effect the result of the induction. At the end of
>>>>the
>>>> loop, if the result is zero, then we restore the original initial
>>>>value.
>>>
>>>+static bool
>>>+is_integer_induction (gimple *stmt, struct loop *loop)
>>>
>>>is_nonwrapping_integer_induction?
>>>
>>>+  tree lhs_max = TYPE_MAX_VALUE (TREE_TYPE (gimple_phi_result (stmt)));
>>>
>>>don't use TYPE_MAX_VALUE.
>>>
>>>+  /* Check that the induction increments.  */
>>>+  if (tree_int_cst_compare (step, size_zero_node) <= 0)
>>>+    return false;
>>>
>>>tree_int_cst_sgn (step) == -1
>>>
>>>+  /* Check that the max size of the loop will not wrap.  */
>>>+
>>>+  if (! max_loop_iterations (loop, &ni))
>>>+    return false;
>>>+  /* Convert backedges to iterations.  */
>>>+  ni += 1;
>>>
>>>just use max_stmt_executions (loop, &ni) which properly checks for
>>>overflow
>>>of the +1.
>>>
>>>+  max_loop_value = wi::add (wi::to_widest (base),
>>>+                           wi::mul (wi::to_widest (step), ni));
>>>+
>>>+  if (wi::gtu_p (max_loop_value, wi::to_widest (lhs_max)))
>>>+    return false;
>>>
>>>you miss a check for the wi::add / wi::mul to overflow.  You can use
>>>extra args to determine this.
>>>
>>>Instead of TYPE_MAX_VALUE use wi::max_value (precision, sign).
>>>
>>>I wonder if you want to skip all the overflow checks for
>>>TYPE_OVERFLOW_UNDEFINED
>>>IV types?
>>>
>>
>> Ok with all the above.
>>
>> Tried using max_value () but this gave me a wide_int instead of a
>> widest_int.
>> Instead I've replaced with min_precision and GET_MODE_BITSIZE.
>>
>> Added an extra check for when the type is TYPE_OVERFLOW_UNDEFINED.
>
>+         /* Set the loop-entry arg of the reduction-phi.  */
>+
>+         if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
>+               == INTEGER_INDUC_COND_REDUCTION)
>
>extra vertical space
>
>+             tree zero = build_int_cst ( TREE_TYPE (vec_init_def_type),
>0);
>+             tree zero_vec = build_vector_from_val (vec_init_def_type,
>zero);
>+
>
>build_zero_cst (vec_init_def_type);
>
>+         else
>+           {
>+             add_phi_arg (as_a <gphi *> (phi), vec_init_def,
>                       loop_preheader_edge (loop), UNKNOWN_LOCATION);
>+           }
>
>no {}s around single stmts
>
>+         tree comparez = build2 (EQ_EXPR, boolean_type_node, new_temp,
>zero);
>
>please no l33t speech


Ha! That wasn't the intention, it was short for "compare zero". Changed to
"zcompare" becuase I didn't want to use "compare".



>
>+         tmp = build3 (COND_EXPR, scalar_type, comparez, initial_def,
>+                       new_temp);
>+
>+         epilog_stmt = gimple_build_assign (new_scalar_dest, tmp);
>+         new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>+         gimple_assign_set_lhs (epilog_stmt, new_temp);
>
>epilog_stmt = gimple_build_assign (make_ssa_name (new_scalar_dest),
>                                                    COND_EXPR,
>compare, initial_def, new_temp);
>

I had to pull the make_ssa_name out to a variable, as it's required for a
scalar_results.safe_push () a few lines down.



>
>+  /* Check that the max size of the loop will not wrap.  */
>+
>+  if (TYPE_OVERFLOW_UNDEFINED (lhs_type))
>+    {
>+      return (GET_MODE_BITSIZE (TYPE_MODE (lhs_type))
>+             >= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (base))));
>
>this mode check will always be true as lhs_type and base are from the
>same PHI node.
>
>+  return (wi::min_precision (max_loop_value, TYPE_SIGN (lhs_type))
>+         <= GET_MODE_BITSIZE (TYPE_MODE (lhs_type)));
>
>please use TYPE_PRECISION (lhs_type) instead.
>
>Ok with those changes.
>

Submitted to head with changes as requested.


Alan.

Attachment: optimizeConditionReductions3.patch
Description: Binary data

Reply via email to