Hi Richi, Thanks for the feedback, if you wouldn't mind indulging me quickly (for the version for next stage-1)
The original approach we had was indeed using a vec-pattern which turned best_i_11 = _4 > best_26 ? i_27 : best_i_25; best_13 = MAX_EXPR <_4, best_26>; into best_13 = MAX_EXPR <_4, best_26>; best_i_11 = _4 == best_13 ? best_i_25 : i_27; which was: static gimple * vect_recog_minmax_index_pattern (stmt_vec_info stmt_vinfo, tree *type_out) { tree oprnd0, oprnd1; gimple *last_stmt = stmt_vinfo->stmt; vec_info *vinfo = stmt_vinfo->vinfo; tree type; gimple *use_stmt; imm_use_iterator iter; gimple_stmt_iterator gsi; /* Starting from LAST_STMT, follow the defs of its uses in search of the above pattern. */ if (!vect_reassociating_reduction_simple_p (stmt_vinfo, MAX_EXPR, &oprnd0, &oprnd1)) return NULL; type = gimple_expr_type (last_stmt); if (!is_a <gphi *> (SSA_NAME_DEF_STMT (oprnd1))) return NULL; stmt_vec_info phy_vinfo = vinfo->lookup_def (oprnd1); if (!phy_vinfo) return NULL; basic_block top_bb = gimple_bb (last_stmt); bool found_p = false; FOR_EACH_IMM_USE_STMT (use_stmt, iter, oprnd1) { if (is_gimple_assign (use_stmt) && gimple_assign_rhs_code (use_stmt) == COND_EXPR) { basic_block bb = gimple_bb (use_stmt); if (bb == top_bb && gimple_uid (use_stmt) < gimple_uid (last_stmt)) { tree cond = gimple_assign_rhs1 (use_stmt); if (TREE_CODE (cond) != GT_EXPR) continue; tree true_b = gimple_assign_rhs2 (use_stmt); tree false_b = gimple_assign_rhs3 (use_stmt); TREE_SET_CODE (cond, NE_EXPR); TREE_OPERAND (cond, 1) = gimple_assign_lhs (last_stmt); gimple_set_op (use_stmt, 3, false_b); gimple_set_op (use_stmt, 2, true_b); gimple_seq *def_seq = &STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo); gsi = gsi_for_stmt (use_stmt, def_seq); gsi_remove (&gsi, false); gsi = gsi_for_stmt (last_stmt, def_seq); gimple_set_bb (use_stmt, bb); gsi_insert_after (&gsi, use_stmt, GSI_SAME_STMT); vect_pattern_detected ("vect_recog_minmax_index_pattern", last_stmt); found_p = true; } } } if (found_p) { STMT_VINFO_DEF_TYPE (phy_vinfo) = vect_min_max_reduction_def; } return NULL; } So it's rewriting the statement into one where the recursive phi node is only used once And marking the reduction as a vect_min_max_reduction_def to handle it correctly later on. There was two things that bugged be about it though. Returning NULL from the vect-pattern felt odd. We could return the last statement here as a the pattern, but since we also have to re-order the statements in the BB it seemed better not to treat it as a simple pattern. This made me think it really didn't belong here and that it's not really a vectorizer pattern, but rather a work around. Maybe this should be done in ifconv? Where the conversions are created in the first place? Also doing this seemed to require a lot of generic boiler plate changes to support vect_min_max_reduction_def, which also seemed a bit messy. Any thoughts here? On a side note, this code in vect_create_epilog_for_reduction for (elt_offset = nelements / 2; elt_offset >= 1; elt_offset /= 2) { calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel); indices.new_vector (sel, 2, nelements); tree mask = vect_gen_perm_mask_any (vectype1, indices); new_name = gimple_build (&stmts, VEC_PERM_EXPR, vectype1, new_temp, zero_vec, mask); new_temp = gimple_build (&stmts, code, vectype1, new_name, new_temp); } gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); ..... rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp, bitsize, bitsize_zero_node); epilog_stmt = gimple_build_assign (new_scalar_dest, rhs); Seems like a very roundabout way of doing a reduction.. unless I have misunderstood something? If "code" is PLUS, MIN, MAX etc wouldn't a reduction call here be better if the target supports it? This is quite hard to patch up later in combine particularly for byte of hf modes where the number of statements you'd need to match exceed combine's maximum. Thanks, Tamar > -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Monday, November 18, 2019 11:24 > To: Joel Hutton <joel.hut...@arm.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Tamar Christina > <tamar.christ...@arm.com>; nd <n...@arm.com> > Subject: Re: [RFC][GCC][AArch64] Add minmax phi-reduction pattern > > On Fri, 15 Nov 2019, Joel Hutton wrote: > > > Forgot to CC maintainer. > > On 15/11/2019 18:03, Joel wrote: > > > Hi all, > > > > > > Just looking for some feedback on the approach. > > > > > > Currently the loop vectorizer can't vectorize the following typical > > > loop for getting max value and index from an array: > > > > > > void test_vec(int *data, int n) { > > > int best_i, best = 0; > > > > > > for (int i = 0; i < n; i++) { > > > if (data[i] > best) { > > > best = data[i]; > > > best_i = i; > > > } > > > } > > > > > > data[best_i] = data[0]; > > > data[0] = best; > > > } > > > > > > This patch adds some support for this pattern. > > > > > > This patch addresses Bug 88259. > > > > > > Regression testing is still in progress, This patch does not work > > > correctly with vect-epilogues-nomask, and the reason for that is > > > still being investigated. > > Quick posting before stage1 ends, heh? > > New functions lack comments, new fields in stmt_info as well, accesses > should go through (missing) STMT_VINFO_* macros. > > You are removing a safety check without replacement elsewhere: > > - /* Check there's only a single stmt the op is used on inside > - of the loop. */ > - imm_use_iterator imm_iter; > - gimple *op_use_stmt; > - unsigned cnt = 0; > - FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op) > - if (!is_gimple_debug (op_use_stmt) > - && flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt))) > - cnt++; > - if (cnt != 1) > - { > - fail = true; > - break; > - } > > AFAICS you still analyze both PHIs but the correct thing to do here is to view > both SSA cycles as a single one - when analyzing > > # best_i_25 = PHI <best_i_11(8), best_i_16(D)(18)> > # best_26 = PHI <best_13(8), 0(18)> > ... > best_i_11 = _4 <= best_26 ? best_i_25 : i_27; > best_13 = MAX_EXPR <_4, best_26>; > > and starting from the best_i_25 PHI nothing prevents check_reduction_path > SCC finding to walk to the best_26 cycle but luck(?). > > And the sanity check should be that all uses are within the detected cycle > (which then would be the case). > > So your handling looks like an afterthought - exactly going backwards of my > attempts to make the code better structured :/ > > The code-gen you do in vect_create_epilog_for_reduction needs a comment > - vect_create_epilog_for_reduction will be called twice in unspecified order, > so clearly unconditionally accessing stmt_info->reduc_related_stmt- > >reduc_minmax_epilog_stmt > (if it is what it looks like...) is not going to work. You are also using > IFN_REDUC_MAX which possibly isn't available. > > So I think what you want to do is "detect" the SCC with two PHIs, then - > maybe in tree-vect-patterns replace it with a single PHI and a single > operation, or somehow mangle it into a SLP tree to make it a single reduction. > > So please see to all this for next stage1. > > Thanks, > Richard. > > > > gcc/ChangeLog: > > > > > > > > > 2019-11-15 Joel Hutton <joel.hut...@arm.com> > > > Tamar Christina <tamar.christ...@arm.com> > > > > > > PR tree-optimization/88259 > > > * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New > > > function. > > > (vect_recog_minmax_index_pattern): New function. > > > (vect_is_simple_reduction): Add check for minmax pattern. > > > (vect_model_reduction_cost): Add case for minmax pattern. > > > (vect_create_epilog_for_reduction): Add fixup for minmax epilog. > > > * tree-vectorizer.h (enum vect_reduction_type): Add > > > INDEX_MINMAX_REDUCTION reduction type.