Hi Richi,

Thanks for the review! 

Just some answers to your questions:

> -----Original Message-----
> From: rguent...@c653.arch.suse.de <rguent...@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Monday, September 28, 2020 1:37 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; o...@ucw.cz
> Subject: Re: [PATCH v2 3/16]middle-end Add basic SLP pattern matching
> scaffolding.
> 
> On Fri, 25 Sep 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch adds the basic infrastructure for doing pattern matching on SLP
> trees.
> > This is done immediately after the SLP tree creation because it can
> > change the shape of the tree in radical ways and so we would like to
> > do it before any analysis is performed on the tree.
> >
> > A new file tree-vect-slp-patterns.c is added which contains all the
> > code for pattern matching on SLP trees.
> >
> > This cover letter is short because the changes are heavily commented.
> >
> > All pattern matchers need to implement the abstract type
> VectPatternMatch.
> > The VectSimplePatternMatch abstract class provides some default
> > functionality for pattern matchers that need to rebuild nodes.
> >
> > The pattern matcher requires if replacing a statement in a node, that
> > ALL statements be replaced.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> +    gcall *build ()
> +    {
> +      stmt_vec_info stmt_info;
> +
> 
> please define functions out-of-line (apart from the 1-liners)
> 
> +      /* We have to explicitly mark the old statement as unused because
> during
> +        statement analysis the original and new pattern statement may
> require
> +        different level of unrolling.  As an example add/sub when
> vectorized
> +        without a pattern requires 4 copies, whereas with a COMPLEX_ADD
> pattern
> +        this only requires 2 copies and the two statement will be
> + treated
> as
> +        hand unrolled.  That means that the analysis won't happen as
> it'll find
> +        a mismatch.  So we don't analyze the old statement and if we
> + end
> up
> +        needing it, e.g. SLP fails then we have to quickly re-analyze it.
> */
> +      STMT_VINFO_RELEVANT (stmt_info) = vect_unused_in_scope;
> +      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> +      STMT_VINFO_RELATED_STMT (call_stmt_info) = stmt_info;
> 
> so this means all uses have to be inside the pattern as otherwise there may
> be even non-SLP uses.  vect_mark_pattern_stmts supports detecting
> patterns of patterns, I suppose the two-phase analysis for SLP patterns does
> not support this right now?
> 
> +      SLP_TREE_CODE (this->m_node) = gimple_expr_code (call_stmt);;
> 
> double ;, just make it CALL_EXPR literally (or leave it ERROR_MARK)
> 
> You seem to do in-place changing of the SLP node you match off?

Yes since this would allow me to change the root node as well, though
thinking about it I can probably do it by passing it as a reference which
then would allow me to re-use vect_create_new_slp_node which is
probably preferable. 

> 
> @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info *vinfo,
>                               &tree_size, bst_map);
>    if (node != NULL)
>      {
> +      /* Temporarily allow add_stmt calls again.  */
> +      vinfo->stmt_vec_info_ro = false;
> +
> +      /* See if any patterns can be found in the constructed SLP tree
> +        before we do any analysis on it.  */
> +      vect_match_slp_patterns (node, vinfo, group_size, &max_nunits,
> +                              matches, &npermutes, &tree_size,
> + bst_map);
> +
> +      /* After this no more add_stmt calls are allowed.  */
> +      vinfo->stmt_vec_info_ro = true;
> +
> 
> I think this is a bit early to match patterns - I'd defer it to the point 
> where all
> entries into the same SLP subgraph are analyzed, thus somewhere at the
> end of vect_analyze_slp loop over all instances and match patterns?  That
> way phases are more clearly separated.

That would probably work, my only worry is that the SLP analysis itself may 
fail and
bail out at 

          /* If the loads and stores can be handled with load/store-lane
             instructions do not generate this SLP instance.  */
          if (is_a <loop_vec_info> (vinfo)
              && loads_permuted
              && dr && vect_store_lanes_supported (vectype, group_size, false))

Which in the initial tree may be true, but in the patterned tree may not be.  
In the previous
revision of the patch you had suggested I return a boolean which can be used to 
cancel such
checks.  Would that be the preferred approach?

> 
> Note that fiddling with vinfo->stmt_vec_info_ro is a bit ugly, maybe add a -
> >add_pattern_stmt (gimple *pattern_stmt, stmt_vec_info
> orig_stmt) variant that also sets STMT_VINFO_RELATED_STMT but doesn't
> check !stmt_vec_info_ro.  That could be used from tree-vect-patterns.c as
> well and we could set stmt_vec_info_ro earlier.
> 
> +  VectPattern *pattern = patt_fn (node, vinfo);  uint8_t n =
> + pattern->get_arity ();
> +
> +  if (group_size % n != 0)
> +    {
> +      delete pattern;
> 
> seems to require VectPattern allocation even upon failure, I suggest to
> return NULL then to avoid excessive allocations.
> 
> +      if (!pattern->matches (stmt_infos, i))
> +       {
> +         /* We can only do replacements for entire groups, we must
> replace all
> +            statements in a node as the argument list/children may not
> have
> +            equal height then.  Operations that don't rewrite the
> arguments
> +            may be safe to do, so perhaps paramatrise it.  */
> +
> +         found_p = false;
> 
> I find it a bit ugly to iterate over "unrolls" in the machinery rather than 
> the
> individual pattern matcher which might have an easier and in particular
> cheaper job here.  Since you require
> all lanes to match the same pattern anyway.   Not sure if your
> later patches support say, mixing complex add with different rotate in the
> same SLP node.

It does, as the constraint only applies to one pattern matcher class handling 
the
entire node.

An example of such case is

node 0x531a1f0 (max_nunits=2, refcnt=2)
 stmt 0 *_9 = _10;
 stmt 1 *_15 = _16;
 stmt 2 *_25 = _26;
 stmt 3 *_31 = _32;
 children 0x531a980
node 0x531a980 (max_nunits=2, refcnt=2)
 stmt 0 slp_patt_112 = .COMPLEX_ADD_ROT90 (_4, _14);
 stmt 1 slp_patt_111 = .COMPLEX_ADD_ROT90 (_12, _8);
 stmt 2 slp_patt_110 = .COMPLEX_ADD_ROT270 (_20, _30);
 stmt 3 slp_patt_109 = .COMPLEX_ADD_ROT270 (_28, _24);
 lane permutation { 0[0] 1[1] 1[2] 0[3] }
 children 0x5310680 0x530e040
node 0x5310680 (max_nunits=2, refcnt=4)
 stmt 0 _4 = *_3;
 stmt 1 _12 = *_11;
 stmt 2 _20 = *_19;
 stmt 3 _28 = *_27;
 load permutation { 0 1 2 3 }
node 0x530e040 (max_nunits=2, refcnt=2)
 stmt 0 _14 = *_13;
 stmt 1 _8 = *_7;
 stmt 2 _30 = *_29;
 stmt 3 _24 = *_23;
 load permutation { 0 1 2 3 }

though looking at the resulting assembly the code is incorrect,

.L6:
        ldr     q1, [x1, x3]
        ldr     q0, [x0, x3]
        fcadd   v0.2d, v0.2d, v1.2d, #270
        str     q0, [x2, x3]
        ldr     q1, [x5, x3]
        ldr     q0, [x6, x3]
        fcadd   v0.2d, v0.2d, v1.2d, #270
        str     q0, [x4, x3]
        add     x3, x3, 32
        cmp     x3, 1600
        bne     .L6
        ret

Which I assume is because SLP_TREE_REPRESENTATIVE is pointing to the rotate 270?

> Note the ultimate idea in the end is that a SLP node can, of
> course, be split into two [but at this point the vector type / unroll factor 
> is not
> final so general splitting at vector boundary is not desired yet].
> The split can be undone for consumers by inserting a VEC_PERM node (which
> should semantically be a concat + select)
> 
> +      tree type = gimple_expr_type (STMT_VINFO_STMT (stmt_info));
> +      tree vectype = get_vectype_for_scalar_type (vinfo, type, node);
> 
> use
> 
>       tree vectype = SLP_TREE_VECTYPE (node);
> 
> generally avoid looking at scalar stmts, iff then look at
> SLP_TREE_REPRESENTATIVE - all lanes have uniform operations applied to
> (but the scalar stmts may not appear to do so!  the scalar stmts merely stand
> for their 'def').
> 
> +  /* Perform recursive matching, it's important to do this after
> + matching
> things
> +     in the current node as the matches here may re-order the nodes
> + below
> it.
> +     As such the pattern that needs to be subsequently match may change.
> */
> +
> +  if (SLP_TREE_CHILDREN (node).exists ()) {
> +    slp_tree child;
> +    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> +      found_rec_p |= vect_match_slp_patterns_2 (child, vinfo, group_size,
> +                                               patt_fn, max_nunits,
> matches,
> +                                               npermutes, tree_size,
> bst_map);
> +  }
> 
> 
> you definitely need a visited set - you are walking a graph and nodes can
> appear along multiple paths!
> 
> +      vect_mark_slp_stmts_relevant (node);
> 
> that walks the whole subgraph but if you need to do anything you at most
> want to touch the node itself, no?
> 
> To make patterns-of-patterns viable you need to do all parts of the walk in
> post-order.  What breaks if you do ->matches/->validate in post-order?  I
> think that would be more future-proof.

You lose the ability to match the longest pattern. As an example the complex 
add and
complex fma patterns overlap. Right now I can try matching the fma first and 
then add.
But doing it in post order the fma woud never match as the subtree would be too 
small
and the add would always match.

Aside from that it makes it very difficult to rebuild the subtrees as the SSA 
names have changed (since build
Is already done in post order),
So right now I can use e.g. _3, _4 etc, however if the patterns have already 
been applied I would
need to know what their replacements are since build () would replace them and 
you lose the ability to navigate
by SSA name.

Regards,
Tamar

> 
> Otherwise this looks like an OK overall design.
> 
> Thanks for working on it!
> 
> Richard.
> 
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >     * Makefile.in (tree-vect-slp-patterns.o): New.
> >     * doc/passes.texi: Update documentation.
> >     * tree-vect-slp.c (vect_match_slp_patterns_2,
> vect_match_slp_patterns):
> >     New.
> >     (vect_analyze_slp_instance): Call pattern matcher.
> >     * tree-vectorizer.h (class VectPatternMatch, class VectPattern): New.
> >     * tree-vect-slp-patterns.c: New file.
> >
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend

Reply via email to