Thanks for starting on this! On Nov 15, 2010, at 3:19 PM, Zhanyong Wan wrote:
> Author: wan > Date: Mon Nov 15 17:19:25 2010 > New Revision: 119288 > > URL: http://llvm.org/viewvc/llvm-project?rev=119288&view=rev > Log: > Adds a README for the Clang static analyzer to document its design and work > flow. The contents are taken from email notes by kremenek and xuzhongxing. > > Added: > cfe/trunk/lib/Checker/README.txt > > Added: cfe/trunk/lib/Checker/README.txt > URL: > http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Checker/README.txt?rev=119288&view=auto > ============================================================================== > --- cfe/trunk/lib/Checker/README.txt (added) > +++ cfe/trunk/lib/Checker/README.txt Mon Nov 15 17:19:25 2010 > @@ -0,0 +1,117 @@ > +//===----------------------------------------------------------------------===// > +// Clang Static Analyzer > +//===----------------------------------------------------------------------===// > + > += Library Structure = > + > +The analyzer library has two layers: a (low-level) static analysis > +engine (GRExprEngine.cpp and friends), and some static checkers > +(*Checker.cpp). The latter are built on top of the former via the > +Checker and CheckerVisitor interfaces (Checker.h and > +CheckerVisitor.h). The Checker interface is designed to be minimal > +and simple for checker writers, and attempts to isolate them from much > +of the gore of the internal analysis engine. > + > += How It Works = > + > +The analyzer is inspired by several foundational research papers ([1], > +[2]). (FIXME: kremenek to add more links) > + > +In a nutshell, the analyzer is basically a source code simulator that > +traces out possible paths of execution. The state of the program > +(values of variables and expressions) is encapsulated by the state > +(GRState). A location in the program is called a program point > +(ProgramPoint), and the combination of state and program point is a > +node in an exploded graph (ExplodedGraph). The term "exploded" comes > +from exploding the control-flow edges in the control-flow graph (CFG). > + > +Conceptually the analyzer does a reachability analysis through the > +ExplodedGraph. We start at a root node, which has the entry program > +point and initial state, and then simulate transitions by analyzing > +individual expressions. The analysis of an expression can cause the > +state to change, resulting in a new node in the ExplodedGraph with an > +updated program point and an updated state. A bug is found by hitting > +a node that satisfies some "bug condition" (basically a violation of a > +checking invariant). > + > +The analyzer traces out multiple paths by reasoning about branches and > +then bifurcating the state: on the true branch the conditions of the > +branch are assumed to be true and on the false branch the conditions > +of the branch are assumed to be false. Such "assumptions" create > +constraints on the values of the program, and those constraints are > +recorded in the GRState object (and are manipulated by the > +ConstraintManager). If assuming the conditions of a branch would > +cause the constraints to be unsatisfiable, the branch is considered > +infeasible and that path is not taken. This is how we get > +path-sensitivity. We reduce exponential blow-up by caching nodes. If > +a new node with the same state and program point as an existing node > +would get generated, the path "caches out" and we simply reuse the > +existing node. Thus the ExplodedGraph is not a DAG; it can contain > +cycles as paths loop back onto each other and cache out. > + > +GRState and ExplodedNodes are basically immutable once created. Once > +one creates a GRState, you need to create a new one to get a new > +GRState. This immutability is key since the ExplodedGraph represents > +the behavior of the analyzed program from the entry point. To > +represent these efficiently, we use functional data structures (e.g., > +ImmutableMaps) which share data between instances. > + > +Finally, individual Checkers work by also manipulating the analysis > +state. The analyzer engine talks to them via a visitor interface. > +For example, the PreVisitCallExpr() method is called by GRExprEngine > +to tell the Checker that we are about to analyze a CallExpr, and the > +checker is asked to check for any preconditions that might not be > +satisfied. The checker can do nothing, or it can generate a new > +GRState and ExplodedNode which contains updated checker state. If it > +finds a bug, it can tell the BugReporter object about the bug, > +providing it an ExplodedNode which is the last node in the path that > +triggered the problem. > + > += Working on the Analyzer = > + > +If you are interested in bringing up support for C++ expressions, the > +best place to look is the visitation logic in GRExprEngine, which > +handles the simulation of individual expressions. There are plenty of > +examples there of how other expressions are handled. > + > +If you are interested in writing checkers, look at the Checker and > +CheckerVisitor interfaces (Checker.h and CheckerVisitor.h). Also look > +at the files named *Checker.cpp for examples on how you can implement > +these interfaces. > + > += Debugging the Analyzer = > + > +There are some useful command-line options for debugging. For example: > + > +$ clang -cc1 -help | grep analyze > + -analyze-function <value> > + -analyzer-display-progress > + -analyzer-viz-egraph-graphviz > + ... > + > +The first allows you to specify only analyzing a specific function. > +The second prints to the console what function is being analyzed. The > +third generates a graphviz dot file of the ExplodedGraph. This is > +extremely useful when debugging the analyzer and viewing the > +simulation results. > + > +Of course, viewing the CFG (Control-Flow Graph) is also useful: > + > +$ clang -cc1 -help | grep cfg > + -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all > analyses > + -cfg-add-initializers Add C++ initializers to CFGs for all analyses > + -cfg-dump Display Control-Flow Graphs > + -cfg-view View Control-Flow Graphs using GraphViz > + -unoptimized-cfg Generate unoptimized CFGs for all analyses > + > +-cfg-dump dumps a textual representation of the CFG to the console, > +and -cfg-view creates a GraphViz representation. > + > += References = > + > +[1] Precise interprocedural dataflow analysis via graph reachability, > + T Reps, S Horwitz, and M Sagiv, POPL '95, > + http://portal.acm.org/citation.cfm?id=199462 > + > +[2] A memory model for static analysis of C programs, Z Xu, T > + Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf > > > _______________________________________________ > cfe-commits mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
