[GitHub] [incubator-tvm] DKXXXL opened a new issue #4468: [RFC] Data-flow Analysis Functionality on TVM IR

GitBox Wed, 04 Dec 2019 16:37:39 -0800

DKXXXL opened a new issue #4468: [RFC] Data-flow Analysis Functionality on TVM 
IR
URL: https://github.com/apache/incubator-tvm/issues/4468
 
 
   # Problem
   
   When developing program passes on TVM IR (the one once was Halide IR), it is 
normal to ask for all sorts of information requiring program analysis, for 
example, live variable analysis for dead code elimination. This requirement 
becomes urgent when TVM has to directly issue intrinsic and the subsequent 
processing stage (for example LLVM) cannot analyze these outputs. 
   
   # Consideration
   
   *  Program Analysis Functionality must be consistent with the immutable 
style of TVM IR/AST
   *  A Simple Framework is required to eliminate as much boilerplate as 
possible when the programmer tries to come up with other types of data-flow 
analysis
   *  The interface must be easy to adapt to
   
   We first start with a concrete example -- Live Variable Analysis.
   
   ###  Live Variable Analysis
   
   I will take Live Variable Analysis as an example. I will define a functor, 
**inputting** *an AST* and *the set of live variable* after the AST is 
executed, **outputting** *the set of live variable* before the AST is executed.
   
   ```cpp
   class LiveAnalysisFunctor : public ExprFunctor<set(const Expr&, const set&)>,
                                public StmtFunctor<set(const Stmt&, const set&)>
   ```
   
   This functor can be easily defined inductively on the *AST*, demonstrated in 
pseudo-code
   ```haskell
   -- the function below taking AST and post-live-variables
   LiveAnalysisFunctor (Block{stmtX,stmtY}) postAlive = 
        let intermediateAlive = LiveAnalysisFunctor stmtY postAlive
        in LiveAnalysisFunctor stmtX intermediateAlive
   ```
   In other words, we propagate up the live variable from the end to the 
beginning, **according to the execution order of TVM IR**
   
   When encountering *if*-statement, we do similarly
   ```haskell
   LiveAnalysisFunctor (IfStmt{cond,branchThen, branchElse}) postAlive = 
        let beforeThen = LiveAnalysisFunctor branchThen postAlive
             beforeElse   = LiveAnalysisFunctor branchElse postAlive
             afterConditional = Union(beforeThen, beforeElse)
        in LiveAnalysisFunctor cond afterConditional
   ```
   The computation of *for*-statement is the only that requiring fixpoint 
computation. 
   
   Also, we know at the endpoint, the live variables will be empty. 
   
   Several (only relevant to live variable analysis) trivial fact:
   * two variables are the same when they are declared at the same place in the 
program (since we have scopes in the AST, we also have variables with the same 
name)
   * We have to distinguish, for each variable/FunctionRef, whether it is def 
or use
   
   #### Memorization
   
   You can see that in the current situation, to have the live variable 
information at arbitrary point, we need to calculate from very end to the point 
(and if that point is inside a for-loop, we need to make sure the fixpoint is 
reached). Thus it is suggested we only compute one time and saving the result. 
Fortunately, the above functor is pure, thus we can try to cache the above 
functor into a standard dictionary/map data structure, and thus *an AST* and 
*the set of live variable* as **key**, and *the set of live variable* before 
the AST as **value**. 
   
   
   The memorization strategy should be able to apply to any *pure* IRFunctor:
   ```cpp
   template <functor>
   class MemorizedVersion<functor> : functor
   ```
   
   During the computation of the liveness of the very beginning of the program, 
the whole program will be visited by the Functor and thus will be cached into 
the dictionary.
   
   However, we may only need the information when fixpoint is reached, which 
means we can keep forgetting the old value every time a new pair of *AST* and 
*PostAlive* is computed while the *AST* has already once inserted into the 
dictionary with other *PostAlive'*. 
   
   Ultimately, we want to transform the above dictionary into another 
dictionary taking *AST* as **key**, and the *live variable before executing 
AST* and *live variable after executing AST* as **value**. Of course, this 
dictionary has a clear specification that "only when the end of the whole 
program has empty set as live variables, each point in the AST has the live 
variable set as indicated in the dictionary".
   
   Also note that, it is possible to query everywhere in the program about the 
Liveness Infomation, as long as the Functor is properly defined everywhere.
   
   # Proposal
   
   We generalize the above functor into forward/backward data-flow analysis 
'framework' to remove boilerplate code as much as possible
   
   ```cpp
   // forward, functional, DFA
   template <class lattice,        // the type lattice, also the 
prestate/poststate type
         class ltProof,      // the proof that lattice is a lattice, a subtype 
of IsLattice<lattice>
         const ltProof& ltOp> // lattice Operations, bind the operation at 
compile time
   class ffdfa : public ExprFunctor<lattice(const Expr&, const lattice&)>,
                 public StmtFunctor<lattice(const Stmt&, const lattice&)> {
     static_assert(std::is_base_of<IsLattice<lattice>, ltProof>::value, 
"ltProof must be a class derived from IsLattice<>");
   ...
   }
   
   template<class T>
   class IsLattice { // classical lattice definition, T is lattice under the 
following operation
     public:
       // a least upperbound of two
       T join(const T& a, const T& b) = 0;
       // a greatest lower bound of two
       T meet(const T& a, const T& b) = 0;
       // highest
       const T& top() = 0;
       // whole set
       const T& bottom() = 0;
       bool equal(const T& a, const T& b) = 0;
   };
   ```
   
   I will indicate the usage of `IsLattice` in the comment section. This part 
(the way defining lattice) is actually quite fluid and more about the 
programming style.
   
   There will also be another class `bfdfa`.
   
   Memorization part:
   ```cpp
   
   template <typename R, // Return type of the functor that want to be memorized
               typename KType,  // key type of the dictionary
               MemorizingPolicy m, // currently only have policy 'override'
               typename ParentFunctor, // the functor that want to be memorized
               typename ...Args, 
               typename ...InitArgs> // initialization argument to used when 
initializing the dictionary
   class RecordingExprFunctor<KType(const Expr& n, Args... args),  // expect  
function that map the input to the KeyType
                             ParentFunctor,
                             m> : public ParentFunctor { ...
   using ThisFunctionType = R(const Expr& n, Args... args);
   static_assert(std::is_base_of<ExprFunctor<ThisFunctionType>, 
ParentFunctor>::value, "ParentFunctor must be of ExprFunctor Type");
   
   
   // we propose to have a unique_ptr to the memorizing dictionary (named as 
"A", that record the evaluation of the functor), so that reuse of 
MemorizedVersion will not act on the already filled dictionary (after each 
memorization, std::move it out).
   
   // we will have another dictionary (called "B" here), that map KeyType to 
the input type
   
   
   R VisitExpr(const Expr& n, Args... args) override;
   
   // Everytime VisitExpr is called with input ‘x’, it will first check if the 
current dict B has the KeyType and if it has and the key is also mapped to the 
"same" input 'x', then we can just query the dict A
   // Otherwise we need to evaluate and according to the above MemorizingPolicy 
'override', we will insert(replace) the new(already existent) the evaluated 
value into dict A
   
   // Designed in this twisted way so that if KeyType = AST, InputType=<AST x 
Lattice>, we can capture the final value of the fixpoint computation.
   
   // Of course here if KeyType = InputType, it will become trivial 
memorization and don't need dict B
   }
   
   // Similar for StmtFunctor
   
   // Then IRFunctor
   template <typename R, 
               typename KType, 
               MemorizingPolicy m,
               typename ParentFunctor,
               typename ...Args, 
               typename ...InitArgs>
   class RecordingIRFunctor<KType(const NodeRef& n, Args... args), 
                             ParentFunctor,
                             m> : RecordingExprFunctor<KType(const Expr& n, 
Args... args), ParentFunctor, m>,
                                  RecordingStmtFunctor<KType(const Stmt& n, 
Args... args), ParentFunctor, m> {...
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-tvm] DKXXXL opened a new issue #4468: [RFC] Data-flow Analysis Functionality on TVM IR

Reply via email to