On Mon, Aug 15, 2016 at 4:16 PM, Alan Hayward <alan.hayw...@arm.com> wrote:
>
>
> On 15/08/2016 12:17, "Richard Biener" <richard.guent...@gmail.com> wrote:
>
>>On Mon, Aug 15, 2016 at 11:48 AM, Alan Hayward <alan.hayw...@arm.com>
>>wrote:
>>> The testcase pr71752.c was failing because the SLP code was generating
>>>an
>>> SLP
>>> vector using arguments from the SLP scalar stmts, but was using the
>>>wrong
>>> argument number.
>>>
>>> vect_get_slp_defs() is given a vector of operands. When calling down to
>>> vect_get_constant_vectors it uses i as op_num - making the assumption
>>>that
>>> the
>>> first op in the vector refers to the first argument in the SLP scalar
>>> statement, the second op refers to the second arg and so on.
>>>
>>> However, previously in vectorizable_reduction, the call to
>>> vect_get_vec_defs
>>> destroyed this ordering by potentially only passing op1.
>>>
>>> The solution is in vectorizable_reduction to create a vector of operands
>>> equal
>>> in size to the number of arguments in the SLP statements. We maintain
>>>the
>>> argument ordering and if we don't require defs for that argument we
>>>instead
>>> push NULL into the vector. In vect_get_slp_defs we need to handle cases
>>> where
>>> an op might be NULL.
>>>
>>> Tested with a check run on X86 and AArch64.
>>> Ok to commit?
>>>
>>
>>Ugh.  Note the logic in vect_get_slp_defs is incredibly fragile - I
>>think you can't
>>simply "skip" ops the way you do as you need to still increment
>>child_index
>>accordingly for ignored ops.
>
> Agreed, I should be maintaining the child_index.
> Looking at the code, I need a valid oprnd for that code to work.
>
> Given that the size of the ops vector is never more than 3, it probably
> makes
> sense to reset child_index each time and do a quick for loop through all
> the
> children.

Hmm, yes - that should work.  It should also remove the need to keep
operand order in sync?  So it might no longer require the
vectorizable_reduction change.

Richard.

>>
>>Why not let the function compute defs for all ops?  That said, the
>>vectorizable_reduction
>>part certainly is fixing a bug (I think I've seen similar issues
>>elsewhere though).
>
> If you let it compute defs for all ops then you can end up creating invalid
> initial value SLP vectors which cause SSA failures (even though nothing
> uses those
> vectors).
>
> In the test case, SLP is defined for _3. Op1 of this is q4_lsm.5_8, but
> that is
> a loop phi. Creating SLP vec refs for it results in vect_cst__67 and
> vect_cst__68, which are clearly invalid:
>
>
>
> <bb 4>:
>   _58 = yg_lsm.7_23 + 1;
>   _59 = _58 + 1;
>   _60 = _58 + 2;
>   vect_cst__61 = {yg_lsm.7_23, _58, _59, _60};
>   vect_cst__62 = { 4, 4, 4, 4 };
>   vect_cst__67 = {q4_lsm.5_8, z5_19, q4_lsm.5_8, z5_19};
>   vect_cst__68 = {q4_lsm.5_8, z5_19, q4_lsm.5_8, z5_19};
>   vect_cst__69 = {jv_9(D), jv_9(D), jv_9(D), jv_9(D)};
>   vect_cst__70 = {jv_9(D), jv_9(D), jv_9(D), jv_9(D)};
>   vect_cst__73 = {0, 0, 0, 0};
>   vect_cst__74 = {q4_lsm.5_16, z5_10(D), 0, 0};
>   vect_cst__91 = { 1, 1, 1, 1 };
>
>   <bb 5>:
>   # z5_19 = PHI <z5_10(D)(4), z5_13(10)>
>   # q4_lsm.5_8 = PHI <q4_lsm.5_16(4), _3(10)>
>   # yg_lsm.7_17 = PHI <yg_lsm.7_23(4), _5(10)>
>   # vect_vec_iv_.14_63 = PHI <vect_cst__61(4), vect_vec_iv_.14_64(10)>
>   # vect__3.15_65 = PHI <vect_cst__74(4), vect__3.15_71(10)>
>   # vect__3.15_66 = PHI <vect_cst__73(4), vect__3.15_72(10)>
>   # ivtmp_94 = PHI <0(4), ivtmp_95(10)>
>   vect_vec_iv_.14_64 = vect_vec_iv_.14_63 + vect_cst__62;
>   _3 = q4_lsm.5_8 - jv_9(D);
>   vect__3.15_71 = vect__3.15_65 - vect_cst__70;
>   vect__3.15_72 = vect__3.15_66 - vect_cst__69;
>   z5_13 = z5_19 - jv_9(D);
>   vect__5.17_92 = vect_vec_iv_.14_63 + vect_cst__91;
>   _5 = yg_lsm.7_17 + 1;
>   ivtmp_95 = ivtmp_94 + 1;
>   if (ivtmp_95 < bnd.11_18)
>     goto <bb 10>;
>   else
>     goto <bb 8>;
>
>
>
>
>>
>>Richard.
>>
>>> Changelog:
>>>
>>> gcc/
>>>         * tree-vect-loop.c (vectorizable_reduction): Keep SLP operand
>>>ordering.
>>>         * tree-vect-slp.c (vect_get_slp_defs): Handle null operands.
>>>
>>> gcc/testsuite/
>>>         * gcc.dg/vect/pr71752.c: New.
>>>
>>>
>>>
>>> Thanks,
>>> Alan.
>>>
>>>
>>>
>>>
>>> diff --git a/gcc/testsuite/gcc.dg/vect/pr71752.c
>>> b/gcc/testsuite/gcc.dg/vect/pr71752.c
>>> new file mode 100644
>>> index
>>>
>>>0000000000000000000000000000000000000000..8d26754b4fedf8b104caae8742a445d
>>>ff
>>> bf23f0a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/pr71752.c
>>> @@ -0,0 +1,19 @@
>>> +/* { dg-do compile } */
>>> +
>>> +unsigned int q4, yg;
>>> +
>>> +unsigned int
>>> +w6 (unsigned int z5, unsigned int jv)
>>> +{
>>> +  unsigned int *f2 = &jv;
>>> +
>>> +  while (*f2 < 21)
>>> +    {
>>> +      q4 -= jv;
>>> +      z5 -= jv;
>>> +      f2 = &yg;
>>> +      ++(*f2);
>>> +    }
>>> +  return z5;
>>> +}
>>> +
>>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>>> index
>>>
>>>2a7e0c6661bc1ba82c9f03720e550749f2252a7c..826481af3d1d8b29bcdbd7d81c0fd5a
>>>85
>>> 9ecd9b0 100644
>>> --- a/gcc/tree-vect-loop.c
>>> +++ b/gcc/tree-vect-loop.c
>>> @@ -5364,7 +5364,7 @@ vectorizable_reduction (gimple *stmt,
>>> gimple_stmt_iterator *gsi,
>>>    auto_vec<tree> vect_defs;
>>>    auto_vec<gimple *> phis;
>>>    int vec_num;
>>> -  tree def0, def1, tem, op0, op1 = NULL_TREE;
>>> +  tree def0, def1, tem, op1 = NULL_TREE;
>>>    bool first_p = true;
>>>    tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type =
>>>NULL_TREE;
>>>    gimple *cond_expr_induction_def_stmt = NULL;
>>> @@ -5964,29 +5964,36 @@ vectorizable_reduction (gimple *stmt,
>>> gimple_stmt_iterator *gsi,
>>>        /* Handle uses.  */
>>>        if (j == 0)
>>>          {
>>> -          op0 = ops[!reduc_index];
>>> -          if (op_type == ternary_op)
>>> -            {
>>> -              if (reduc_index == 0)
>>> -                op1 = ops[2];
>>> -              else
>>> -                op1 = ops[1];
>>> -            }
>>> +         if (slp_node)
>>> +           {
>>> +             /* Get vec defs for all the operands except the reduction
>>>index,
>>> +               ensuring the ordering of the ops in the vector is kept.
>>> */
>>> +             auto_vec<tree, 3> slp_ops;
>>> +             auto_vec<vec<tree>, 3> vec_defs;
>>>
>>> -          if (slp_node)
>>> -            vect_get_vec_defs (op0, op1, stmt, &vec_oprnds0,
>>>&vec_oprnds1,
>>> -                               slp_node, -1);
>>> +             slp_ops.quick_push ((reduc_index == 0) ? NULL : ops[0]);
>>> +             slp_ops.quick_push ((reduc_index == 1) ? NULL : ops[1]);
>>> +             if (op_type == ternary_op)
>>> +               slp_ops.quick_push ((reduc_index == 2) ? NULL : ops[2]);
>>> +
>>> +             vect_get_slp_defs (slp_ops, slp_node, &vec_defs, -1);
>>> +
>>> +             vec_oprnds0.safe_splice (vec_defs[(reduc_index == 0) ? 1
>>>: 0]);
>>> +             if (op_type == ternary_op)
>>> +               vec_oprnds1.safe_splice (vec_defs[(reduc_index == 2) ?
>>>1 : 2]);
>>> +           }
>>>            else
>>> -            {
>>> +           {
>>>                loop_vec_def0 = vect_get_vec_def_for_operand
>>> (ops[!reduc_index],
>>>                                                              stmt);
>>>                vec_oprnds0.quick_push (loop_vec_def0);
>>>                if (op_type == ternary_op)
>>>                 {
>>> +                op1 = (reduc_index == 0) ? ops[2] : ops[1];
>>>                   loop_vec_def1 = vect_get_vec_def_for_operand (op1,
>>>stmt);
>>>                   vec_oprnds1.quick_push (loop_vec_def1);
>>>                 }
>>> -            }
>>> +           }
>>>          }
>>>        else
>>>          {
>>> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>>> index
>>>
>>>fb325d54f1084461d44cd54a98e5b7f99541a188..7c480d59c823b5258255c8be047f050
>>>c8
>>> 3cc91fd 100644
>>> --- a/gcc/tree-vect-slp.c
>>> +++ b/gcc/tree-vect-slp.c
>>> @@ -3200,10 +3200,19 @@ vect_get_slp_defs (vec<tree> ops, slp_tree
>>> slp_node,
>>>    vec<tree> vec_defs;
>>>    tree oprnd;
>>>    bool vectorized_defs;
>>> +  bool first_iteration = true;
>>>
>>>    first_stmt = SLP_TREE_SCALAR_STMTS (slp_node)[0];
>>>    FOR_EACH_VEC_ELT (ops, i, oprnd)
>>>      {
>>> +      if (oprnd == NULL)
>>> +       {
>>> +         vec_defs = vNULL;
>>> +         vec_defs.create (0);
>>> +         vec_oprnds->quick_push (vec_defs);
>>> +         continue;
>>> +       }
>>> +
>>>        /* For each operand we check if it has vectorized definitions in
>>>a
>>> child
>>>          node or we need to create them (for invariants and constants).
>>> We
>>>          check if the LHS of the first stmt of the next child matches
>>>OPRND.
>>> @@ -3240,7 +3249,7 @@ vect_get_slp_defs (vec<tree> ops, slp_tree
>>>slp_node,
>>>
>>>        if (!vectorized_defs)
>>>          {
>>> -          if (i == 0)
>>> +         if (first_iteration)
>>>              {
>>>                number_of_vects = SLP_TREE_NUMBER_OF_VEC_STMTS
>>>(slp_node);
>>>                /* Number of vector stmts was calculated according to
>>>LHS in
>>> @@ -3276,6 +3285,8 @@ vect_get_slp_defs (vec<tree> ops, slp_tree
>>>slp_node,
>>>        /* For reductions, we only need initial values.  */
>>>        if (reduc_index != -1)
>>>          return;
>>> +
>>> +      first_iteration = false;
>>>      }
>>>  }
>>>
>>>
>>
>
>

Reply via email to