On Fri, 15 Nov 2019, Joel Hutton wrote:

> Forgot to CC maintainer.
> On 15/11/2019 18:03, Joel wrote:
> > Hi all,
> >
> > Just looking for some feedback on the approach.
> >
> > Currently the loop vectorizer can't vectorize the following typical 
> > loop for getting max value and index from an array:
> >
> > void test_vec(int *data, int n) {
> >         int best_i, best = 0;
> >
> >         for (int i = 0; i < n; i++) {
> >                 if (data[i] > best) {
> >                         best = data[i];
> >                         best_i = i;
> >                 }
> >         }
> >
> >         data[best_i] = data[0];
> >         data[0] = best;
> > }
> >
> > This patch adds some support for this pattern.
> >
> > This patch addresses Bug 88259.
> >
> > Regression testing is still in progress,
> > This patch does not work correctly with vect-epilogues-nomask, and the 
> > reason for that is still being investigated.

Quick posting before stage1 ends, heh?

New functions lack comments, new fields in stmt_info as well,
accesses should go through (missing) STMT_VINFO_* macros.

You are removing a safety check without replacement elsewhere:

-      /* Check there's only a single stmt the op is used on inside
-         of the loop.  */
-      imm_use_iterator imm_iter;
-      gimple *op_use_stmt;
-      unsigned cnt = 0;
-      FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op)
-       if (!is_gimple_debug (op_use_stmt)
-           && flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt)))
-         cnt++;
-      if (cnt != 1)
-       {
-         fail = true;
-         break;
-       }

AFAICS you still analyze both PHIs but the correct thing to do
here is to view both SSA cycles as a single one - when analyzing

  # best_i_25 = PHI <best_i_11(8), best_i_16(D)(18)>
  # best_26 = PHI <best_13(8), 0(18)>
...
  best_i_11 = _4 <= best_26 ? best_i_25 : i_27;
  best_13 = MAX_EXPR <_4, best_26>;

and starting from the best_i_25 PHI nothing prevents check_reduction_path
SCC finding to walk to the best_26 cycle but luck(?).

And the sanity check should be that all uses are within the
detected cycle (which then would be the case).

So your handling looks like an afterthought - exactly going backwards
of my attempts to make the code better structured :/

The code-gen you do in vect_create_epilog_for_reduction needs a
comment - vect_create_epilog_for_reduction will be called
twice in unspecified order, so clearly unconditionally
accessing stmt_info->reduc_related_stmt->reduc_minmax_epilog_stmt
(if it is what it looks like...) is not going to work.  You are
also using IFN_REDUC_MAX which possibly isn't available.

So I think what you want to do is "detect" the SCC with two PHIs,
then - maybe in tree-vect-patterns replace it with a single PHI
and a single operation, or somehow mangle it into a SLP tree
to make it a single reduction.

So please see to all this for next stage1.

Thanks,
Richard.

> > gcc/ChangeLog:
> >
> >
> > 2019-11-15  Joel Hutton  <joel.hut...@arm.com>
> >         Tamar Christina  <tamar.christ...@arm.com>
> >
> >     PR tree-optimization/88259
> >     * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New 
> > function.
> >     (vect_recog_minmax_index_pattern): New function.
> >     (vect_is_simple_reduction): Add check for minmax pattern.
> >     (vect_model_reduction_cost): Add case for minmax pattern.
> >     (vect_create_epilog_for_reduction): Add fixup for minmax epilog.
> >     * tree-vectorizer.h (enum vect_reduction_type): Add 
> > INDEX_MINMAX_REDUCTION reduction type.

Reply via email to