On Wed, 16 Oct 2019 at 04:19, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Richard Biener <richard.guent...@gmail.com> writes: > > On Tue, Oct 15, 2019 at 8:07 AM Prathamesh Kulkarni > > <prathamesh.kulka...@linaro.org> wrote: > >> > >> On Wed, 9 Oct 2019 at 08:14, Prathamesh Kulkarni > >> <prathamesh.kulka...@linaro.org> wrote: > >> > > >> > On Tue, 8 Oct 2019 at 13:21, Richard Sandiford > >> > <richard.sandif...@arm.com> wrote: > >> > > > >> > > Leaving the main review to Richard, just some comments... > >> > > > >> > > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > >> > > > @@ -9774,6 +9777,10 @@ vect_is_simple_cond (tree cond, vec_info > >> > > > *vinfo, > >> > > > > >> > > > When STMT_INFO is vectorized as a nested cycle, for_reduction is > >> > > > true. > >> > > > > >> > > > + For COND_EXPR<C, T, E> if T comes from masked load, and is > >> > > > conditional > >> > > > + on C, we apply loop mask to result of vector comparison, if it's > >> > > > present. > >> > > > + Similarly for E, if it is conditional on !C. > >> > > > + > >> > > > Return true if STMT_INFO is vectorizable in this way. */ > >> > > > > >> > > > bool > >> > > > >> > > I think this is a bit misleading. But IMO it'd be better not to have > >> > > a comment here and just rely on the one in the main function body. > >> > > This optimisation isn't really changing the vectorisation strategy, > >> > > and the comment could easily get forgotten if things change in future. > >> > > > >> > > > [...] > >> > > > @@ -9999,6 +10006,35 @@ vectorizable_condition (stmt_vec_info > >> > > > stmt_info, gimple_stmt_iterator *gsi, > >> > > > /* Handle cond expr. */ > >> > > > for (j = 0; j < ncopies; j++) > >> > > > { > >> > > > + tree loop_mask = NULL_TREE; > >> > > > + bool swap_cond_operands = false; > >> > > > + > >> > > > + /* Look up if there is a loop mask associated with the > >> > > > + scalar cond, or it's inverse. */ > >> > > > >> > > Maybe: > >> > > > >> > > See whether another part of the vectorized code applies a loop > >> > > mask to the condition, or to its inverse. > >> > > > >> > > > + > >> > > > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > >> > > > + { > >> > > > + scalar_cond_masked_key cond (cond_expr, ncopies); > >> > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) > >> > > > + { > >> > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > >> > > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, > >> > > > vectype, j); > >> > > > + } > >> > > > + else > >> > > > + { > >> > > > + bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0)); > >> > > > + cond.code = invert_tree_comparison (cond.code, > >> > > > honor_nans); > >> > > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) > >> > > > + { > >> > > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS > >> > > > (loop_vinfo); > >> > > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, > >> > > > + vectype, j); > >> > > > + cond_code = cond.code; > >> > > > + swap_cond_operands = true; > >> > > > + } > >> > > > + } > >> > > > + } > >> > > > + > >> > > > stmt_vec_info new_stmt_info = NULL; > >> > > > if (j == 0) > >> > > > { > >> > > > @@ -10114,6 +10153,47 @@ vectorizable_condition (stmt_vec_info > >> > > > stmt_info, gimple_stmt_iterator *gsi, > >> > > > } > >> > > > } > >> > > > } > >> > > > + > >> > > > + /* If loop mask is present, then AND it with > >> > > > >> > > Maybe "If we decided to apply a loop mask, ..." > >> > > > >> > > > + result of vec comparison, so later passes (fre4) > >> > > > >> > > Probably better not to name the pass -- could easily change in future. > >> > > > >> > > > + will reuse the same condition used in masked load. > >> > > > >> > > Could be a masked store, or potentially other things too. > >> > > So maybe just "will reuse the masked condition"? > >> > > > >> > > > + > >> > > > + For example: > >> > > > + for (int i = 0; i < 100; ++i) > >> > > > + x[i] = y[i] ? z[i] : 10; > >> > > > + > >> > > > + results in following optimized GIMPLE: > >> > > > + > >> > > > + mask__35.8_43 = vect__4.7_41 != { 0, ... }; > >> > > > + vec_mask_and_46 = loop_mask_40 & mask__35.8_43; > >> > > > + _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, > >> > > > offset: 0B]; > >> > > > + vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46); > >> > > > + vect_iftmp.12_52 = VEC_COND_EXPR <vec_mask_and_46, > >> > > > + vect_iftmp.11_47, { 10, > >> > > > ... }>; > >> > > > + > >> > > > + instead of recomputing vec != { 0, ... } in vec_cond_expr > >> > > > */ > >> > > > >> > > That's true, but gives the impression that avoiding the vec != { 0, > >> > > ... } > >> > > is the main goal, whereas we could do that just by forcing a > >> > > three-operand > >> > > COND_EXPR. It's really more about making sure that vec != { 0, ... } > >> > > and its masked form aren't both live at the same time. So maybe: > >> > > > >> > > instead of using a masked and unmasked forms of > >> > > vect__4.7_41 != { 0, ... } (masked in the MASK_LOAD, > >> > > unmasked in the VEC_COND_EXPR). */ > >> > > > >> > Hi Richard, > >> > Thanks for the suggestions, I have updated comments in the attached > >> > patch. > >> Hi, > >> The attached patch is rebased on trunk, and after PR91532 fix, the > >> hunk for fmla_2.c is no > >> longer required. > > > > Hmm. So we already record some mask info - you just add in addition > > to that the scalar predicate representing the mask. I wonder if you can > > integrate that into the existing vec_loop_masks vector instead of > > adding another data structure on the side? Not that I am understanding > > the existing fully masked code at all (or specifically what it computes > > as nscalars_per_iter, etc. ... :/). > > We can AND several different scalar conditions with the same loop > mask (that's relatively common), and could even AND the same scalar > condition with different loop masks (although that's less likely). > So I think having separate info makes sense. > > > At least add the new vinfo member right to the other masks related > > field. > > Agree that would be better. > > > @@ -10122,6 +10157,48 @@ vectorizable_condition (stmt_vec_info stmt_info, > > gimple_stmt_iterator *gsi, > > } > > } > > } > > + > > + /* If we decided to apply a loop mask to result of vec > > + comparison, so later passes will reuse the same condition. > > Maybe: > > /* If we decided to apply a loop mask to the result of the vector > comparison, AND the comparison with the mask now. Later passes > should then be able to reuse the AND results between mulitple > vector statements. > > > + For example: > > + for (int i = 0; i < 100; ++i) > > + x[i] = y[i] ? z[i] : 10; > > + > > + results in following optimized GIMPLE: > > + > > + mask__35.8_43 = vect__4.7_41 != { 0, ... }; > > + vec_mask_and_46 = loop_mask_40 & mask__35.8_43; > > + _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, offset: 0B]; > > + vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46); > > + vect_iftmp.12_52 = VEC_COND_EXPR <vec_mask_and_46, > > + vect_iftmp.11_47, { 10, ... }>; > > + > > + instead of using a masked and unmasked forms of > > + vec != { 0, ... } (masked in the MASK_LOAD, > > + unmasked in the VEC_COND_EXPR). */ > > The last paragraph uses some space rather than tab indentation. > > > +/* If code(T) is comparison op or def of comparison stmt, > > + extract it's operands. > > + Else return <NE_EXPR, T, 0>. */ > > + > > +void > > +scalar_cond_masked_key::get_cond_ops_from_tree (tree t) > > +{ > > Maybe: > > /* If the condition represented by T is a comparison or the SSA name > result of a comparison, extract the comparison's operands. Represent > T as NE_EXPR <T, 0> otherwise. */ > > OK with those changes and the one Richard asked for. Thanks! Committed in r277141.
Thanks, Prathamesh > > Thanks, > Richard