Hi. Richi. I'd like to share more details that I want to do in VSETVL PASS.
Consider this following case: for for for ... for VSETVL demand: RATIO = 32 and TU policy. For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will hoist "vsetvli e32,tu" to the outer-most loop. However, for this case: for for for ... for if (...) VSETVL 1 demand: RATIO = 32 and TU policy. else if (...) VSETVL 2 demand: SEW = 16. else VSETVL 3 demand: MU policy. 'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 1, VSETVL 2 and VSETVL 3 are 3 different VSETVL demands 'pre_edge_lcm_av' can only hoist one of them. Such case I can easily produce by RVV intrinsic and they are already in our RVV testsuite. To get the optimal codegen for this case, We need I call it as "Demand fusion" which is fusing all "compatible" VSETVLs into a single VSETVL then set them to avoid redundant VSETVLs. In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 into new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands (VSETVL 1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand. Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the program will be transformed as: VSETVL SEW = 16, LMUL = MF2, TU, MU for for for ... for if (...) ..... no vsetvl insn. else if (...) .... no vsetvl insn. else .... no vsetvl insn. So, how to do the demand fusion in this case? Before this patch and following RISC-V refactor patch, I do it explictly with my own decide algorithm. Meaning I calculate which location of the program to do the VSETVL fusion is correct and optimal. However, I found "compute_earliest" can help us to do the job for calculating the location of the program to do VSETVL fusion and turns out it's a quite more reliable and reasonable approach than I do. So that's why I export those 2 functions for us to be use in Phase 3 (Demand fusion) in RISC-V backend VSETVL PASS. Thanks. juzhe.zh...@rivai.ai From: Richard Biener Date: 2023-08-21 15:09 To: Juzhe-Zhong CC: gcc-patches; jeffreyalaw Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend On Mon, 21 Aug 2023, Juzhe-Zhong wrote: > This patch exports 'compute_antinout_edge' and 'compute_earliest' as global > scope > which is going to be used in VSETVL PASS of RISC-V backend. > > The demand fusion is the fusion of VSETVL information to emit VSETVL which > dominate and pre-config for most > of the RVV instructions in order to elide redundant VSETVLs. > > For exmaple: > > for > for > for > if (cond} > VSETVL demand 1: SEW/LMUL = 16 and TU policy > else > VSETVL demand 2: SEW = 32 > > VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW > = 32, LMUL = M2, TU policy. > Then emit such VSETVL at the outmost of the for loop to get the most optimal > codegen and run-time execution. > > Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and > un-reliable as well as un-maintainable. > And, I recently read dragon book and morgan's book again, I found there > "earliest" can allow us to do the > demand fusion in a very reliable and optimal way. > > So, this patch exports these 2 functions which are very helpful for VSETVL > pass. It would be nice to put these internal functions into a class or a namespace given their non LCM name. I don't see how you are going to use these intermediate DF functions - they are just necessary to compute pre_edge_lcm_avs which I see you already do. Just to say you are possibly going to blow up compile-time complexity of your VSETVL dataflow problem? > gcc/ChangeLog: > > * lcm.cc (compute_antinout_edge): Export as global use. > (compute_earliest): Ditto. > (compute_rev_insert_delete): Ditto. > * lcm.h (compute_antinout_edge): Ditto. > (compute_earliest): Ditto. > > --- > gcc/lcm.cc | 7 ++----- > gcc/lcm.h | 3 +++ > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/gcc/lcm.cc b/gcc/lcm.cc > index 94a3ed43aea..03421e490e4 100644 > --- a/gcc/lcm.cc > +++ b/gcc/lcm.cc > @@ -56,9 +56,6 @@ along with GCC; see the file COPYING3. If not see > #include "lcm.h" > > /* Edge based LCM routines. */ > -static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap > *); > -static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *, > - sbitmap *, sbitmap *, sbitmap *); > static void compute_laterin (struct edge_list *, sbitmap *, sbitmap *, > sbitmap *, sbitmap *); > static void compute_insert_delete (struct edge_list *edge_list, sbitmap *, > @@ -79,7 +76,7 @@ static void compute_rev_insert_delete (struct edge_list > *edge_list, sbitmap *, > This is done based on the flow graph, and not on the pred-succ lists. > Other than that, its pretty much identical to compute_antinout. */ > > -static void > +void > compute_antinout_edge (sbitmap *antloc, sbitmap *transp, sbitmap *antin, > sbitmap *antout) > { > @@ -170,7 +167,7 @@ compute_antinout_edge (sbitmap *antloc, sbitmap *transp, > sbitmap *antin, > > /* Compute the earliest vector for edge based lcm. */ > > -static void > +void > compute_earliest (struct edge_list *edge_list, int n_exprs, sbitmap *antin, > sbitmap *antout, sbitmap *avout, sbitmap *kill, > sbitmap *earliest) > diff --git a/gcc/lcm.h b/gcc/lcm.h > index e08339352e0..7145d6fc46d 100644 > --- a/gcc/lcm.h > +++ b/gcc/lcm.h > @@ -31,4 +31,7 @@ extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *, > sbitmap *, sbitmap *, > sbitmap *, sbitmap **, > sbitmap **); > +extern void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap > *); > +extern void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *, > + sbitmap *, sbitmap *, sbitmap *); > #endif /* GCC_LCM_H */ > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)