On Mon, 18 May 2020, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > Hi, > > > > I'm trying to enforce SLP_TREE_VECTYPE being set (correctly) on > > external and invariant SLP nodes to avoid (re-)computing that > > in other places. > > Nice. > > > The responsible entity for specifying the > > desired vector type is vectorizable_* - vectorization analysis > > of the user (when we start to share those nodes between multiple > > users a user can then also unshare such a node if an existing > > vector type conflicts with its own requirements). > > > > The previous patch > > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545938.html > > set up things so that vect_slp_analyze_node_operations would > > use such type information during costing and also pick up > > unshared nodes. > > > > Now as you can see below where I started to enforce SLP_TREE_VECTYPE > > for the special boolean case in vect_get_constant_vectors I seemingly > > only had to fixup two places, vectorizable_operation (saw bool & bool) > > and vectorizable_comparison doing testing on x86_64 and aarch64 > > (cross cc1 on aarch64-sve.exp and vect.exp). There obviously will > > be more places and I was thinking of less ugly (and more correct?) > > ways than doing > > > > + FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), i, child) > > + if (SLP_TREE_DEF_TYPE (child) != vect_internal_def) > > + SLP_TREE_VECTYPE (child) = vectype; > > > > in each vectorizable_*. In principle it is the vect_is_simple_use > > that would need adjustment to associate the correct SLP child with > > a vectype so I can't think of sth better than refactoring things > > to sth like > > > > vect_simple_uses (vinfo, stmt_info, slp_node, &dts, &vectypes) > > > > and process all operans at once transparently for stmt_info/slp_node > > (where the actual vect_is_simple_use check is redundant for SLP > > nodes anyway). The index into the array would then match up > > with the SLP child (if SLP) or the stmt operand (if not SLP - with > > SLP the order can be different on the stmt). > > > > Any better ideas or comments on the general direction? Richard, > > which vectorizable_* do you remember being the "worst" with respect > > to vector type determining for invariants? > > I think most of the struggles I had were from different code ending > up with different results when recomputing the vectype. IIRC the > vectorizable_* routines usually got it right and it was the other > code that got it wrong. So yeah, can imagine that we don't need > many changes if the vectoriable_* routines get the final say. > > Not related to invariants specifically, but the routines I remember > being most awkward for vector type choices were vectorizable_condition > (because of the embedded comparison) and vectorizable_conversion (because > there could be multiple stages, with the intermediate vector types not > being recorded). > > OTOH, I think calculating the ideal vector types requires > propagating the vector type in both directions, rather than > just upwards from the roots. E.g. for: > > double d1, d2, d3, d4; > float f1, f2, f3; > > f1 = d1 == d2 & d3 == d4 ? f2 : f3; > > when operating on full vectors, there needs to be a conversion > somewhere between "d-booleans" and "f-booleans". For this case > it probably should happen as early as possible, e.g.: > > b1_lo = d1_lo == d2_lo > b1_hi = d1_hi == d2_hi > > b2_lo = d3_lo == d4_lo > b2_hi = d3_hi == d4_hi > > b1 = pack (b1_lo, b1_hi) > > b2 = pack (b2_lo, b2_hi) > > b3 = b1 & b2 > > f1 = b3 ? f2 : f3; > > But if the roles are reversed: > > float f1, f2, f3, f4; > double d1, d2, d3; > > d1 = f1 == f2 & f3 == f4 ? d2 : d3; > > it should happen after the "&": > > b1 = f1 == f2 > > b2 = f3 == f4 > > b3 = b1 & b2 > > b3_lo = unpack_lo (b3) > b3_hi = unpack_hi (b3) > > d1_lo = b3_lo ? d2_lo : d3_lo; > d1_hi = b3_hi ? d2_hi : d3_hi; > > At the moment we rely on pattern matching for that. But it basically > means that the pattern matcher has to "predict" what vector types the > vectorizable_* routines would calculate, so that it knows whether to > create pattern statements that enforce a different choice.
Yeah, there's cases where it's difficult... > IMO it'd be really good to get rid of that and unify the vector > type calculation. But that probably means not doing it during > vectorizable_*, or at least changing the way that the vectorizable_* > functions are called during analysis, so that information can flow in > both directions. Note currently we're committing to vector types even earlier than vectorizable_* (SLP analysis, dataref analysis and mark_stmts_to_be_vectorized). We also can't delay endlessly, at the moment vectorizable_* rely on a fixed vectorization factor and since those functions eventually check optabs for support they also really need to have fixed vector types for an operations input and output vectors. Now - we could move towards vectorizable_* only computing constraints for vector input/output types (say, a plus needs compatible intput/output and the machine can do v2si and v4si) and then some "magic" afterwards computes the optimal set of types. That would then require to dissect costing from vectorizable_* until after the types are fixed. And I fear it wouldn't handle the case you cite above where we also change the overall pattern - and thus may affect the compatibility matrix of operations. But yes, I'm currently facing the issue that SLP discovery commits to a vectorization factor too early. I'm going to need to change the max_nunits thing to a min_vf which then ends me at an iteration over possible VFs again (compared to iteration over vector modes). Not completely sure how things will play out though. Thanks, Richard.