Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

juzhe.zh...@rivai.ai Mon, 21 Aug 2023 00:47:49 -0700

Hi. Richi.
I'd like to share more details that I want to do in VSETVL PASS.

Consider this following case:

for
  for 
    for
      ...
         for
             VSETVL demand: RATIO = 32 and TU policy.

For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will hoist 
"vsetvli e32,tu" to the outer-most loop.

However, for this case:
  for
  for 
    for
      ...
         for
           if (...)
             VSETVL 1 demand: RATIO = 32 and TU policy.
           else if (...)
             VSETVL 2 demand: SEW = 16.
           else
             VSETVL 3 demand: MU policy.

'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 1,  
VSETVL 2 and VSETVL 3 are 3 different VSETVL demands
'pre_edge_lcm_av' can only hoist one of them. Such case I can easily produce by 
RVV intrinsic and they are already in our RVV testsuite.

To get the optimal codegen for this case,  We need I call it as "Demand fusion" 
which is fusing all "compatible" VSETVLs into a single VSETVL
then set them to avoid redundant VSETVLs.

In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 into 
new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single 
new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands (VSETVL 
1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand.
Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the 
program will be transformed as:

VSETVL SEW = 16, LMUL = MF2, TU, MU
  for
  for 
    for
      ...
         for
           if (...) 
             .....   no vsetvl insn.
           else if (...)
             ....  no vsetvl insn.

           else
             ....  no vsetvl insn.

So, how to do the demand fusion in this case? 
Before this patch and following RISC-V refactor patch, I do it explictly with 
my own decide algorithm.
Meaning I calculate which location of the program to do the VSETVL fusion is 
correct and optimal.

However, I found "compute_earliest" can help us to do the job for calculating 
the location of the program to do VSETVL fusion and
turns out it's a quite more reliable and reasonable approach than I do.

So that's why I export those 2 functions for us to be use in Phase 3 (Demand 
fusion) in RISC-V backend VSETVL PASS.

Thanks.

juzhe.zh...@rivai.ai

From: Richard Biener
Date: 2023-08-21 15:09
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS 
use in RISC-V backend
On Mon, 21 Aug 2023, Juzhe-Zhong wrote:

> This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
> scope
> which is going to be used in VSETVL PASS of RISC-V backend.
> 
> The demand fusion is the fusion of VSETVL information to emit VSETVL which 
> dominate and pre-config for most
> of the RVV instructions in order to elide redundant VSETVLs.
> 
> For exmaple:
> 
> for
>  for
>   for
>     if (cond}
>       VSETVL demand 1: SEW/LMUL = 16 and TU policy
>     else
>       VSETVL demand 2: SEW = 32
> 
> VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW 
> = 32, LMUL = M2, TU policy.
> Then emit such VSETVL at the outmost of the for loop to get the most optimal 
> codegen and run-time execution.
> 
> Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and 
> un-reliable as well as un-maintainable.
> And, I recently read dragon book and morgan's book again, I found there 
> "earliest" can allow us to do the
> demand fusion in a very reliable and optimal way.
> 
> So, this patch exports these 2 functions which are very helpful for VSETVL 
> pass.

It would be nice to put these internal functions into a class or a
namespace given their non LCM name.  I don't see how you are going
to use these intermediate DF functions - they are just necessary
to compute pre_edge_lcm_avs which I see you already do.  Just to say
you are possibly going to blow up compile-time complexity of your
VSETVL dataflow problem?

> gcc/ChangeLog:
> 
> * lcm.cc (compute_antinout_edge): Export as global use.
> (compute_earliest): Ditto.
> (compute_rev_insert_delete): Ditto.
> * lcm.h (compute_antinout_edge): Ditto.
> (compute_earliest): Ditto.
> 
> ---
>  gcc/lcm.cc | 7 ++-----
>  gcc/lcm.h  | 3 +++
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/lcm.cc b/gcc/lcm.cc
> index 94a3ed43aea..03421e490e4 100644
> --- a/gcc/lcm.cc
> +++ b/gcc/lcm.cc
> @@ -56,9 +56,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "lcm.h"
>  
>  /* Edge based LCM routines.  */
> -static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap 
> *);
> -static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *,
> -       sbitmap *, sbitmap *, sbitmap *);
>  static void compute_laterin (struct edge_list *, sbitmap *, sbitmap *,
>       sbitmap *, sbitmap *);
>  static void compute_insert_delete (struct edge_list *edge_list, sbitmap *,
> @@ -79,7 +76,7 @@ static void compute_rev_insert_delete (struct edge_list 
> *edge_list, sbitmap *,
>     This is done based on the flow graph, and not on the pred-succ lists.
>     Other than that, its pretty much identical to compute_antinout.  */
>  
> -static void
> +void
>  compute_antinout_edge (sbitmap *antloc, sbitmap *transp, sbitmap *antin,
>         sbitmap *antout)
>  {
> @@ -170,7 +167,7 @@ compute_antinout_edge (sbitmap *antloc, sbitmap *transp, 
> sbitmap *antin,
>  
>  /* Compute the earliest vector for edge based lcm.  */
>  
> -static void
> +void
>  compute_earliest (struct edge_list *edge_list, int n_exprs, sbitmap *antin,
>    sbitmap *antout, sbitmap *avout, sbitmap *kill,
>    sbitmap *earliest)
> diff --git a/gcc/lcm.h b/gcc/lcm.h
> index e08339352e0..7145d6fc46d 100644
> --- a/gcc/lcm.h
> +++ b/gcc/lcm.h
> @@ -31,4 +31,7 @@ extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *,
>     sbitmap *, sbitmap *,
>     sbitmap *, sbitmap **,
>     sbitmap **);
> +extern void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap 
> *);
> +extern void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *,
> +       sbitmap *, sbitmap *, sbitmap *);
>  #endif /* GCC_LCM_H */
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

Reply via email to