On Mon, 18 May 2020, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > Hi,
> >
> > I'm trying to enforce SLP_TREE_VECTYPE being set (correctly) on
> > external and invariant SLP nodes to avoid (re-)computing that
> > in other places.
> 
> Nice.
> 
> > The responsible entity for specifying the
> > desired vector type is vectorizable_* - vectorization analysis
> > of the user (when we start to share those nodes between multiple
> > users a user can then also unshare such a node if an existing
> > vector type conflicts with its own requirements).
> >
> > The previous patch 
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545938.html
> > set up things so that vect_slp_analyze_node_operations would
> > use such type information during costing and also pick up
> > unshared nodes.
> >
> > Now as you can see below where I started to enforce SLP_TREE_VECTYPE
> > for the special boolean case in vect_get_constant_vectors I seemingly
> > only had to fixup two places, vectorizable_operation (saw bool & bool)
> > and vectorizable_comparison doing testing on x86_64 and aarch64
> > (cross cc1 on aarch64-sve.exp and vect.exp).  There obviously will
> > be more places and I was thinking of less ugly (and more correct?)
> > ways than doing
> >
> > +         FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (slp_node), i, child)
> > +           if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
> > +             SLP_TREE_VECTYPE (child) = vectype;
> >
> > in each vectorizable_*.  In principle it is the vect_is_simple_use
> > that would need adjustment to associate the correct SLP child with
> > a vectype so I can't think of sth better than refactoring things
> > to sth like
> >
> >   vect_simple_uses (vinfo, stmt_info, slp_node, &dts, &vectypes)
> >
> > and process all operans at once transparently for stmt_info/slp_node
> > (where the actual vect_is_simple_use check is redundant for SLP
> > nodes anyway).  The index into the array would then match up
> > with the SLP child (if SLP) or the stmt operand (if not SLP - with
> > SLP the order can be different on the stmt).
> >
> > Any better ideas or comments on the general direction?  Richard,
> > which vectorizable_* do you remember being the "worst" with respect
> > to vector type determining for invariants?
> 
> I think most of the struggles I had were from different code ending
> up with different results when recomputing the vectype.  IIRC the
> vectorizable_* routines usually got it right and it was the other
> code that got it wrong.  So yeah, can imagine that we don't need
> many changes if the vectoriable_* routines get the final say.
> 
> Not related to invariants specifically, but the routines I remember
> being most awkward for vector type choices were vectorizable_condition
> (because of the embedded comparison) and vectorizable_conversion (because
> there could be multiple stages, with the intermediate vector types not
> being recorded).
> 
> OTOH, I think calculating the ideal vector types requires
> propagating the vector type in both directions, rather than
> just upwards from the roots.  E.g. for:
> 
>   double d1, d2, d3, d4;
>   float f1, f2, f3;
> 
>   f1 = d1 == d2 & d3 == d4 ? f2 : f3;
> 
> when operating on full vectors, there needs to be a conversion
> somewhere between "d-booleans" and "f-booleans".  For this case
> it probably should happen as early as possible, e.g.:
> 
>   b1_lo = d1_lo == d2_lo
>   b1_hi = d1_hi == d2_hi
> 
>   b2_lo = d3_lo == d4_lo
>   b2_hi = d3_hi == d4_hi
> 
>   b1 = pack (b1_lo, b1_hi)
> 
>   b2 = pack (b2_lo, b2_hi)
> 
>   b3 = b1 & b2
> 
>   f1 = b3 ? f2 : f3;
> 
> But if the roles are reversed:
> 
>   float f1, f2, f3, f4;
>   double d1, d2, d3;
> 
>   d1 = f1 == f2 & f3 == f4 ? d2 : d3;
> 
> it should happen after the "&":
> 
>   b1 = f1 == f2
> 
>   b2 = f3 == f4
> 
>   b3 = b1 & b2
> 
>   b3_lo = unpack_lo (b3)
>   b3_hi = unpack_hi (b3)
> 
>   d1_lo = b3_lo ? d2_lo : d3_lo;
>   d1_hi = b3_hi ? d2_hi : d3_hi;
> 
> At the moment we rely on pattern matching for that.  But it basically
> means that the pattern matcher has to "predict" what vector types the
> vectorizable_* routines would calculate, so that it knows whether to
> create pattern statements that enforce a different choice.

Yeah, there's cases where it's difficult...

> IMO it'd be really good to get rid of that and unify the vector
> type calculation.  But that probably means not doing it during
> vectorizable_*, or at least changing the way that the vectorizable_*
> functions are called during analysis, so that information can flow in
> both directions.

Note currently we're committing to vector types even earlier than
vectorizable_* (SLP analysis, dataref analysis and 
mark_stmts_to_be_vectorized).  We also can't delay endlessly,
at the moment vectorizable_* rely on a fixed vectorization factor
and since those functions eventually check optabs for support they
also really need to have fixed vector types for an operations
input and output vectors.

Now - we could move towards vectorizable_* only computing
constraints for vector input/output types (say, a plus needs
compatible intput/output and the machine can do v2si and v4si)
and then some "magic" afterwards computes the optimal set of
types.  That would then require to dissect costing from
vectorizable_* until after the types are fixed.  And I fear
it wouldn't handle the case you cite above where we also
change the overall pattern - and thus may affect the
compatibility matrix of operations.

But yes, I'm currently facing the issue that SLP discovery
commits to a vectorization factor too early.  I'm going to
need to change the max_nunits thing to a min_vf which then
ends me at an iteration over possible VFs again (compared
to iteration over vector modes).  Not completely sure how
things will play out though.

Thanks,
Richard.

Reply via email to