Heya,

first draft of a document many users of our tools have been requesting ;)

This is a first draft where I basically dumped what I would have
wished for when I started out, but I might be missing stuff or not
knowing about similar documentation existing somewhere.

If you think this is a good idea to get out, let me know what other
topics you want to see covered in this document. To me it'll be
required reading before starting the next document about the AST
matchers, which require some basic knowledge about the Clang AST...

Feedback welcome!
/Manuel
Title: Mastering the Clang AST

Mastering the Clang AST

This document gives a gentle introduction to the mysteries of the Clang AST. It is targetted at Developers who either want to contribute to Clang, or use tools that work based on Clang's AST, like the AST matchers.

Introduction

Clang's AST is different from ASTs of other languages in that it closely resembles both the written C++ code and the C++ standard. For example, parenthesis expressions and compile time constants are available in an unreduced form in the AST. This makes Clang's AST a good fit for refactoring tools.

Documentation for all Clang AST nodes is available via the generated Doxygen. The doxygen online documentation is also indexed by your favorite search engine, which will make a search for clang and the AST node's class name usually turn up the doxygen of the class you're looking for (for example, search for: clang ParenExpr).

Examining the AST

A good way to familarize yourself with the Clang AST is to actually look at it on some simple example code. Clang has a builtin AST-dump mode, which can be enabled with the flags -ast-dump and -ast-dump-xml.

Let's look at a simple example AST:

# cat test.cc
int f(int x) {
  int result = (x / 42);
  return result;
}

# Clang by default is a frontend for many tools; -cc1 tells it to directly
# use the C++ compiler mode.
$ clang -cc1 -ast-dump-xml test.cc
... cutting out internal declarations of clang ...
<TranslationUnit ptr="0x4871160">
 <Function ptr="0x48a5800" name="f" prototype="true">
  <FunctionProtoType ptr="0x4871de0" canonical="0x4871de0">
   <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
   <parameters>
    <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
   </parameters>
  </FunctionProtoType>
  <ParmVar ptr="0x4871d80" name="x" initstyle="c">
   <BuiltinType ptr="0x4871250" canonical="0x4871250"/>
  </ParmVar>
  <Stmt>
(CompoundStmt 0x48a5a38 <t2.cc:1:14, line:4:1>
  (DeclStmt 0x48a59c0 <line:2:3, col:24>
    0x48a58c0 "int result =
      (ParenExpr 0x48a59a0 <col:16, col:23> 'int'
        (BinaryOperator 0x48a5978 <col:17, col:21> 'int' '/'
          (ImplicitCastExpr 0x48a5960 <col:17> 'int' <LValueToRValue>
            (DeclRefExpr 0x48a5918 <col:17> 'int' lvalue ParmVar 0x4871d80 'x' 'int'))
          (IntegerLiteral 0x48a5940 <col:21> 'int' 42)))")
  (ReturnStmt 0x48a5a18 <line:3:3, col:10>
    (ImplicitCastExpr 0x48a5a00 <col:10> 'int' <LValueToRValue>
      (DeclRefExpr 0x48a59d8 <col:10> 'int' lvalue Var 0x48a58c0 'result' 'int'))))

  </Stmt>
 </Function>
</TranslationUnit>

In general, -ast-dump-xml dumps declarations as XML and statements as S-expressions. The toplevel declaration in a translation unit is always the translation unit declaration. In this example, our first user written declaration is the function declaration of 'f'. The body of 'f' is a compound statement, whose child nodes are a declaration statement that delcares our result variable, and the return statement.

AST Context

All information about the AST for a translation unit is bundled up in the class ASTContext. It allows traversal of the whole translation unit starting from getTranslationUnitDecl, or to access Clang's table of identifiers for the parsed translation unit.

AST Nodes

Clang's AST nodes are modeled on a type hierarchy that does not have a common ancestor. Instead, there are multiple larger hierarchies for basic node types like Decl and Stmt. Many other nodes in the AST are not part of a larger hierarchy, and are only reachable from specific other nodes, like CXXBaseSpecifier.

Thus, to traverse the full AST, one starts from the TranslationUnitDecl and then recursively traverses everything that can be reached from that node - this information has to be encoded for each specific node type. This algorithm is encoded in the RecursiveASTVisitor. See the RecursiveASTVisitor tutorial.

The two most basic nodes in the Clang AST are statements (Stmt) and declarations (Decl). Note that expressions (Expr) are also statements in Clang's AST.

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to