On Mon, Oct 19, 2015 at 10:17 AM, Alan Hayward <[email protected]> wrote:
>
>
>>On 30/09/2015 13:45, "Richard Biener" <[email protected]> wrote:
>>
>>>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <[email protected]>
>>>wrote:
>>>>
>>>>
>>>> On 18/09/2015 14:53, "Alan Hayward" <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>>On 18/09/2015 14:26, "Alan Lawrence" <[email protected]> wrote:
>>>>>
>>>>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>>>>
>>>>>>> Ok, I see.
>>>>>>>
>>>>>>> That this case is already vectorized is because it implements
>>>>>>>MAX_EXPR,
>>>>>>> modifying it slightly to
>>>>>>>
>>>>>>> int foo (int *a)
>>>>>>> {
>>>>>>> int val = 0;
>>>>>>> for (int i = 0; i < 1024; ++i)
>>>>>>> if (a[i] > val)
>>>>>>> val = a[i] + 1;
>>>>>>> return val;
>>>>>>> }
>>>>>>>
>>>>>>> makes it no longer handled by current code.
>>>>>>>
>>>>>>
>>>>>>Yes. I believe the idea for the patch is to handle arbitrary
>>>>>>expressions
>>>>>>like
>>>>>>
>>>>>>int foo (int *a)
>>>>>>{
>>>>>> int val = 0;
>>>>>> for (int i = 0; i < 1024; ++i)
>>>>>> if (some_expression (i))
>>>>>> val = another_expression (i);
>>>>>> return val;
>>>>>>}
>>>>>
>>>>>Yes, thatâs correct. Hopefully my new test cases should cover
>>>>>everything.
>>>>>
>>>>
>>>> Attached is a new version of the patch containing all the changes
>>>> requested by Richard.
>>>
>>>+ /* Compare the max index vector to the vector of found indexes to
>>>find
>>>+ the postion of the max value. This will result in either a
>>>single
>>>+ match or all of the values. */
>>>+ tree vec_compare = make_ssa_name (index_vec_type_signed);
>>>+ gimple vec_compare_stmt = gimple_build_assign (vec_compare,
>>>EQ_EXPR,
>>>+ induction_index,
>>>+ max_index_vec);
>>>
>>>I'm not sure all targets can handle this. If I deciper the code
>>>correctly then we do
>>>
>>> mask = induction_index == max_index_vec;
>>> vec_and = mask & vec_data;
>>>
>>>plus some casts. So this is basically
>>>
>>> vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };
>>>
>>>without the need to relate the induction index vector type to the data
>>>vector type.
>>>I believe this is also the form all targets support.
>>
>>
>>Ok, Iâll replace this.
>>
>>>
>>>I am missing a comment before all this code-generation that shows the
>>>transform
>>>result with the variable names used in the code-gen. I have a hard
>>>time connecting
>>>things here.
>>
>>Ok, Iâll add some comments.
>>
>>>
>>>+ tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR,
>>>scalar_type,
>>>+ matched_data_reduc);
>>>+ epilog_stmt = gimple_build_assign (new_scalar_dest,
>>>+ matched_data_reduc_cast);
>>>+ new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>>>+ gimple_assign_set_lhs (epilog_stmt, new_temp);
>>>
>>>this will leave the stmt unsimplified. scalar sign-changes should use
>>>NOP_EXPR,
>>>not VIEW_CONVERT_EXPR. The easiest fix is to use fold_convert instead.
>>>Also just do like before - first make_ssa_name and then directly use it
>>>in the
>>>gimple_build_assign.
>>
>>We need the VIEW_CONVERT_EXPR for the cases where we have float data
>>values. The index is always integer.
>>
>>
>>>
>>>The patch is somewhat hard to parse with all the indentation changes. A
>>>context
>>>diff would be much easier to read in those contexts.
>>
>>Ok, Iâll make the next patch like that
>>
>>>
>>>+ if (v_reduc_type == COND_REDUCTION)
>>>+ {
>>>+ widest_int ni;
>>>+
>>>+ if (! max_loop_iterations (loop, &ni))
>>>+ {
>>>+ if (dump_enabled_p ())
>>>+ dump_printf_loc (MSG_NOTE, vect_location,
>>>+ "loop count not known, cannot create cond "
>>>+ "reduction.\n");
>>>
>>>ugh. That's bad.
>>>
>>>+ /* The additional index will be the same type as the condition.
>>>Check
>>>+ that the loop can fit into this less one (because we'll use up
>>>the
>>>+ zero slot for when there are no matches). */
>>>+ tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
>>>+ if (wi::geu_p (ni, wi::to_widest (max_index)))
>>>+ {
>>>+ if (dump_enabled_p ())
>>>+ dump_printf_loc (MSG_NOTE, vect_location,
>>>+ "loop size is greater than data size.\n");
>>>+ return false;
>>>
>>>Likewise.
>>
>>We could do better if we made the index type larger.
>>But as a first implementation of this optimisation, I didnât want to
>>overcomplicate things more.
>>
>>>
>>>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>>>gimple_stmt_iterator *gsi,
>>> if (dump_enabled_p ())
>>> dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>>>
>>>+ STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
>>>+
>>> /* FORNOW: Multiple types are not supported for condition. */
>>> if (code == COND_EXPR)
>>>
>>>this change looks odd (or wrong). The type should be _only_ set/changed
>>>during
>>>analysis.
>>
>>
>>The problem is, for COND_EXPRs, this function calls
>>vectorizable_condition(), which sets STMT_VINFO_TYPE to
>>condition_vec_info_type.
Ah, the pre-existing issue of the transform phase re-doing the analysis...
a fix would be to condition that call on vec_stmt == NULL, thus analysis
phase.
>>Therefore we need something to restore it back to reduc_vec_info_type on
>>the non-analysis call.
>>
>>I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after
>>the call to vectorizable_condition(), but that looked worse.
>>I could back up the value of STMT_VINFO_TYPE before calling
>>vectorizable_condition() and then restore it after? I think thatâll look a
>>lot better.
>>
>>
>>>
>>>+
>>>+ /* For cond reductions we need to add an additional conditional
>>>based on
>>>+ the loop index. */
>>>+ if (v_reduc_type == COND_REDUCTION)
>>>+ {
>>>+ int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>>>+ int k;
>>>...
>>>+ STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
>>>+ set_vinfo_for_stmt (index_condition, index_vec_info);
>>>+
>>>+ /* Update the phi with the vec cond. */
>>>+ add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
>>>+ UNKNOWN_LOCATION);
>>>
>>>same as before - I am missing a comment that shows the generated code
>>>and connects
>>>the local vars used.
>>
>>Ok, Iâll add something
>>
>>>
>>>
>>>+ tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
>>>+ gimple ccompare_stmt = gimple_build_assign (ccompare_name,
>>>ccompare);
>>>+ gsi_insert_before (&vec_stmt_gsi, ccompare_stmt,
>>>GSI_SAME_STMT);
>>>+ gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
>>>
>>>hum - are you sure this works with ncopies > 1? Will it use the
>>>correct vec_stmt?
>>
>>We donât support this when ncopies >1.
>>
>>In vectorizable_reduction():
>>
>>if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
>> {
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> "multiple types in double reduction or condition "
>> "reduction.\n");
>> return false;
>> }
>>
>>
>>
>>>
>>>I still dislike the v_reduc_type plastered and passed everywhere. Can
>>>you explore
>>>adding the reduction kind to stmt_info?
>>
>>Ok, I can do that.
>>
>>
>>Thanks for the comments.
>>Iâll put together a patch with the above changes.
>>
>>Thanks,
>>Alan.
>>
>>
>
> Richard, as requested I've updated with the follow changes:
>
> * AND and EQ replaced with a COND_EXPR
> * Better comments for the code gen, included references to variable names
> * Kept the VIEW_CONVERT_EXPR - we need this for when the data is float type
> * Backed up STMT_VINFO_TYPE before the call to vectorizable_condition()
> and restored after the call. I considered extracting the relvant parts of
> vectorizable_condition() into a sub function, but it had too many
> dependencies with the rest of vectorizable_condition().
> * v_reduc_type is now part of stmt_info
> * Created a diff using 50 lines of context
Ok with the vectorizable_condition call guarded as suggested instead.
Thanks,
Richard.
>
> Thanks,
> Alan.
>
>