Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

Alan Hayward Tue, 15 Sep 2015 08:33:34 -0700


On 15/09/2015 13:09, "Richard Biener" <richard.guent...@gmail.com> wrote:


>
>Now comments on the patch itself.
>
>+      if (code == COND_EXPR)
>+       *v_reduc_type = COND_REDUCTION;
>
>so why not simply use COND_EXPR instead of the new v_reduc_type?

v_reduc_type is also dependant on check_reduction (which comes from
!nested_cycle in vectorizable_reduction).
It seemed messy to keep on checking for both those things throughout.

In my patch to catch simpler condition reductions, I’ll be adding another
value to this enum too. v_reduc_type will be set to this new value based
on the same properties for COND_REDUCTION plus some additional constraints.

>
>+  if (check_reduction && code != COND_EXPR &&
>+      vect_is_slp_reduction (loop_info, phi, def_stmt))
>
>&&s go to the next line

ok

>
>+             /* Reduction of the max index and a reduction of the found
>+                values.  */
>+             epilogue_cost += add_stmt_cost (target_cost_data, 1,
>+                                             vec_to_scalar, stmt_info, 0,
>+                                             vect_epilogue);
>
>vec_to_scalar once isn't what the comment suggests.  Instead the
>comment suggests twice what a regular reduction would do
>but I guess we can "hide" the vec_to_scalar cost and "merge" it
>with the broadcast.  Thus make the above two vector_stmt costs?
>
>+             /* A broadcast of the max value.  */
>+             epilogue_cost += add_stmt_cost (target_cost_data, 2,
>+                                             scalar_to_vec, stmt_info, 0,
>+                                             vect_epilogue);
>
>comment suggests a single broadcast.

I’ve made a copy/paste error here. Just need to swap the 1 and the 2.


>
>@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
>          the final vector of induction results:  */
>       exit_phi = NULL;
>       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
>-        {
>+       {
>          gimple use_stmt = USE_STMT (use_p);
>          if (is_gimple_debug (use_stmt))
>            continue;
>
>please avoid unrelated whitespace changes.

Ok. I was changing “8 spaces” to a tab, but happy to revert.

>
>+      case COND_EXPR:
>+       if (v_reduc_type == COND_REDUCTION)
>+         {
>...
>+       /* Fall through.  */
>+
>       case MIN_EXPR:
>       case MAX_EXPR:
>-      case COND_EXPR:
>
>aww, so we could already handle COND_EXPR reductions?  How do they
>differ from what you add?  Did you see if that path is even exercised
>today?

Today, COND_EXPRs are only supported when they are nested inside a loop.
See the vect-cond-*.c tests.
For example:

for (j = 0; j < M; j++)
    {
  x = x_in[j];
  curr_a = a[0];

    for (i = 0; i < N; i++)
      {
        next_a = a[i+1];
        curr_a = x > c[i] ? curr_a : next_a;
      }
  x_out[j] = curr_a;
}


In that case, only the outer loop is vectorised.

>
>+           /* Create a vector of {init_value, 0, 0, 0...}.  */
>+           vec<constructor_elt, va_gc> *v;
>+           vec_alloc (v, nunits);
>+           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
>+           if (SCALAR_FLOAT_TYPE_P (scalar_type))
>+             for (i = 1; i < nunits; ++i)
>+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>+                                       build_real (scalar_type,
>dconst0));
>+           else
>+             for (i = 1; i < nunits; ++i)
>+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>+                                       build_int_cst (scalar_type, 0));
>+           init_def = build_constructor (vectype, v);
>
>you can unify the float/int case by using build_zero_cst (scalar_type).
>Note that you should build a vector constant instead of a constructor
>if init_val is a constant.  The convenient way is to build the vector
>elements into a tree[] array and use build_vector_stat in that case.

Ok, will switch to build_zero_cst.
Also, I will switch my vector to  {init_value, init_value, init_value…}.
I had {init_value, 0, 0, 0…} because I was going have the option of
ADD_REDUC_EXPR,
But that got removed along the way.

>
>+      /* Find maximum value from the vector of found indexes.  */
>+      tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");
>
>just use make_ssa_name (index_scalar_type);

Ok

>
>+      /* Convert the vector of data to the same type as the EQ.  */
>+      tree vec_data_cast;
>+      if ( TYPE_UNSIGNED (index_vec_type))
>+       {
>
>How come it never happens the element
>sizes do not match?  (int index type and double data type?)

This was a little unclear.
The induction index is originally created as an unsigned version of the
type as the data vector.
(see the definition of cr_index_vector_type in vectorizable_reduction(),
which is then used to create cond_name)

I will remove the if and replace with a gcc_checking_assert(TYPE_UNSIGNED
(index_vec_type))


>
>+      /* Where the max index occured, use the value from the data
>vector.  */
>+      tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL,
>"");
>+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
>+                                                vec_compare,
>vec_data_cast);
>
>that is, don't you need to do some widening/shortening on the comparison
>result?
>(also what happens in the case "or all of the values"?)
>
>Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)

Given the induction index is the same size as the data, this will be ok.

I don’t like the VIEW_CONVERT_MAGIC either! But I couldn’t see any other
way of making it work.


The case of "all of the values” is when there were no matches in the loop.
When this happens, the data vector will contain only the default values
{init_value, 0, 0, 0…}
(… once I change my code due to the previous comment, we’ll have
{init_value, init_value, init_value…} ).
The REDUC_MAX_EXPR will turn either of those vectors into a “init_value”.

>
>+   This function also handles reduction of condition expressions, for
>example:
>+     for (int i = 0; i < N; i++)
>+       if (a[i] < value)
>+        last = a[i];
>+   This is handled by vectorising the loop and creating an additional
>vector
>+   containing the loop indexes for which "a[i] < value" was true.  In the
>+   function epilogue this is reduced to a single max value and then used
>to
>+   index into the vector of results.
>
>I miss a comment that shows the kind of code we transform this to.
>"an additional vector containing the loop indexes" can't work - the vector
>will not be large enough ;)  Naiively I would have made 'last' a vector,
>performing the reduction element-wise and in the epilogue reduce
>'last' itself.  And it looks like we are already doing that for
>
>int foo (int *a)
>{
>  int val = 0;
>  for (int i = 0; i < 1024; ++i)
>    if (a[i] > val)
>      val = a[i];
>  return val;
>}
>
>I must be missing something.  Yes, I think we can't do index reduction
>yet,
>but pr65947-10.c is alrady handled?


I think an easier way of looking thinking about this is to see what
happens each iteration of the loop:

Assuming the example above and a vector length of 4.
The index vector starts as (0,0,0,0).
On the first iteration, the condition passes for, lets say, i=2 and i=3,
then index vector is (0,0,2,3)
on the next iteration, the condition passes for i=4, then index vector is
now (4,0,2,3)
on the next iteration, the condition passes for i=8 and i=9, then index
vector is now (8,9,2,3)
Etc
In the end we might end up with something like index vector=(26,2,54,14).
The “2” has not been set since the first iteration.
The only value we care about is the last one set - the 54. The rest are
meaningless to us.
At the same time the data vector has been storing values at the same time
as the index vector, and so now contains (a[26], a[2], a[54], a[14])

In the epilogue, we reduce the index vector down to “54”, then create a
vector (54,54,54,54).
This is matched against the index vector to give us
(false,false,true,false).
Then we use that vector to index the data vector, giving us (0, 0, a[54],
0).
Which we then reduce down to a single value.





>
>Stopping here.

Thanks for the comments so far.
I’ll put together a new patch with the changes above.


Alan.

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

Reply via email to