Hi Sean,
Thanks for this nice write-up. I observed that when I evaluate an expression 
like 'expr variable', LLDB runs the target as doing an inferior function call 
and then run the interpreter too. I was wondering if this is expected behaviour 
or I am missing something.

Thanks,
Abid

From: [email protected] [mailto:[email protected]] On 
Behalf Of Sean Callanan
Sent: 17 August 2013 02:13
To: [email protected]
Subject: [lldb-dev] LLDB expression parser presentation

This is the outline of a brief presentation I gave on the LLDB expression 
parser.
I've included some "thorny issues;" if we resolve these, the expression parser 
will get a lot better.
Please let me know if you have any questions.

Class layout
          The master - ClangExpressionParser manages Clang and LLVM to compile 
a single expression
          Its minions:
                      ClangExpression - a unit of parseable code
                                  ClangUserExpression - specialized for the 
case where we're using the "expr" command
                      ExpressionSourceCode - handles wrapping
                      ClangASTSource - resolves external variables
                                  ClangExpressionDeclMap - specialized for the 
current frame (if stopped at a particular location in the program being 
debugged)
                      IRForTarget - rewrites IR
                      ASTResultSynthesizer - makes the result
                      IRMemoryMap - manages memory that may or may be in the 
program being debugged, or may be simulated by LLDB
                                  IRExecutionUnit - specialized to be able to 
interact with the JIT

Basic Expression Flow
          User enters the expression: (lldb) expr a + 2
          We wrap the expression: void expr(arg *) { a + 2; }

                      We wrap differently based on expression context.
                      If stopped in a C++ instance method, we wrap as 
$__lldb_class::$__lldb_expr(void *)
                      If stopped in an Objective-C instance method, we wrap as 
an Objective-C category
                      If stopped in regular C code, we wrap as 
$__lldb_expr(void*)
                      But we always parse in Objective-C++ mode.

                      Typical wrapped expression:
                                  #define ... // custom definitions provided by 
LLDB or the user
                                  void
                                  $__lldb_class::$__lldb_expr // __lldb_class 
resolves to the type of *this in the current frame
                                              (void *$__lldb_arg)
                                  {
                                              // expression text goes here
                                  }

          We resolve externals: "a" => int &a;

                      This happens via a question-and-answer process with the 
Clang compiler through the clang::ExternalASTSource interface
                      FindExternalVisibleDeclsByName searches for "globals" 
(globals from the perspective of the expression; these may be locals in the 
current stack frame)
                      FindExternalLexicalDecls searches a single struct for all 
entities of a particular type
                      CompleteType ensures that a single struct has all of its 
contents
                      (These are useful because we lazily complete structs, 
providing a forward declaration first and only filling it in when needed)

                      clang::ASTImporter is responsible for transferring Decls 
from one ASTContext (e.g., the ASTContext for a DWARF file) to another (e.g., 
the AST context for an expression)
                      Our ClangASTImporter manages many of these ("Minions"), 
because there are many separate DWARF files containing debug information.
                      We need to be able to remember where things came from.

          We add the result: static int ret = a + 2;

                      This happens at the Clang AST level
                      We handle Lvalues and Rvalues differently.
                      For Lvalues, we store a pointer to them: T *$__result_ptr 
= ...
                      For Rvalues, we store the value itself: static T 
$__result = ... // static ensures the expression doesn't try to use a register 
or something silly like that
                      We also store persistent types at this stage, e.g. struct 
$my_foo { int a; int b; }

          We rewrite the IR: *(arg+0) = *(arg+8)+2

                      The IR as emitted by Clang's CodeGen expects all external 
variables to be in symbols
                      This is inconvenient if they are e.g. in registers, since 
you can't link against a register
                      This is also inconvenient for expression re-use, for 
example as a breakpoint condition... we'd have to re-link each time
                      Our solution is to indirect variables through a struct 
passed into the expression (void *$__lldb_arg)

                      Materializer's job is to put all variables that aren't 
referred to by symbols into this struct
                      It will create temporary storage as necessary (e.g., to 
hold a variable value that was in a register)
                      After the expression runs, a Dematerializer takes down 
all temporary storage, and ensures that variables are updated to reflect the 
expression's side effects

                      The IRForTarget class does various cleanup to help 
RTDyldMemoryManager (ideally much of this shouldn't be necessary)
                      It resolves all external symbols to avoid forcing 
RTDyldMemoryManager to resolve symbols
                      It creates a string and float literal pool so 
RTDyldMemoryManager doesn't have to relocate the constant pool
                      It strips off nasty Objective-C metadata so 
RTDyldMemoryManager doesn't have to look at it

          We interpret or execute the result: (int)$0 = 6

                      IRExecutionUnit contains a module and the (real or 
simulated) memory it uses

                      IRInterpreter can interpret a module without ever running 
the underlying process
                      It emulates IR instructions one by one
                      It uses lldb_private::Scalar to hold intermediate values, 
which is kinda limiting (no vectors, no FP math)
                      IRExecutionUnit simulates memory allocation etc. so we 
can do a lot of pointer magic

                      If the IRInterpreter can't run, the MCJIT produces 
machine code and LLDB runs it
                      IRExecutionUnit vends a custom JITMemoryManager 
implementation
                      It remembers memory allocations and where functions were 
placed
                      After JIT, all sections are placed into the target and we 
report their new locations with mapSectionAddress

Selected Thorny Issues (concentrating on JIT-related issues)
          Make the MCJIT more robust so we can rely on it more
                      Support all Mach-O and ELF relocation types
                      Don't assume resolved symbols are in the current process
                      Don't assume addresses fit into void*s
          Make the IRInterpreter support all data types and instructions
                      Completely replace the LLVM interpreter!

Sean

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Reply via email to