> On Jul 22, 2014, at 4:26 PM, Randy Smith <[email protected]> wrote:
> 
> Greg: Thanks very much for the detailed explanation!  As I mentioned in my 
> response to Reid, this does indeed seem related to the known 
> -fstandalone-debug issue.  I'd still like to dig down to the floor (i.e. to 
> the point where I understand this specific issue), with a vague hope that it 
> may be a reasonable thing for me to try and fix.  So I'd like to ask you a 
> couple of questions on your summary.
> 
> First question: Is there a tool to probe for symbol information (forward decl 
> vs. full information) in a shared library?  I see llvm-dwarfdump, but it 
> looks to be just dumping symbols rather than interpreting them.  


This comes down to really dumping the DWARF. We have a dwarfdump command on 
MacOSX, if you have access to a Mac I can help you with how to just see the 
information you want to as llvm-dwarfdump doesn't have the tools we need 
(lookup a DWARF debug info entry (DIE) by name, or by offset, dump a single 
DIE with children/parents, etc).



> On Tue, Jul 22, 2014 at 4:09 PM, Greg Clayton <[email protected]> wrote:
> 
> > On Jul 22, 2014, at 11:41 AM, Randy Smith <[email protected]> wrote:
> >
> >
> > I'm chasing a crash in lldb, and my current "that doesn't seem right" has 
> > to do with a conflict between a decl and its origin decl (the 
> > transformation done at the beginning of 
> > tools/lldb/source/Expression/ClangASTSource.cpp:ClangASTSource::layoutRecordType()).
> >   So I'm trying to understand how decls and origin decls get setup during 
> > the symbol import process.  Can anyone give me a sketch/hand?  Specific 
> > questions include:
> > * There are multiple ASTContexts involved (e.g. the src and dst contexts in 
> > the signature of 
> > tools/lldb/source/Symbol/ClangASTImporter.cpp:ClangASTImporter::CopyType); 
> > do those map to compilation units, or to shared library modules?  Is there 
> > a simple way to tell what CU/.so an ASTContext maps to?
> 
> Every executable file is represented by a lldb_private::Module (this includes 
> both executables and shared libraries) and each lldb_private::Module has its 
> own ASTContext (one per module, and all compilation units are all represented 
> in one big ASTContext). The DWARF debug info is parsed and it creates types 
> in the ASTContext in the corresponding lldb_private::Module.
> 
> > * Does a decl always have an origin decl, even if it was loaded from an 
> > ASTContext (?) that has a complete definition?
> 
> Origin decl is so we know where a decl originally came from because the 
> definition might not yet be complete (think "class Foo;") and might need to 
> be completed. A little background on how we lazily parse classes.
> 
> When someone needs a type, we parse the type (SymbolFileDWARF::ParseType). If 
> that type is a class we always just parse a forward decl to the class ("class 
> Foo;"). The DWARF parser (SymbolFileDWARF) implements 
> clang::ExternalASTSource so it can complete a type only when the compiler 
> needs to know more. When the compiler or ClangASTType needs to know more 
> about a type it asks the type to get a complete version of itself and 
> SymbolFileDWARF::CompleteTagDecl is called to complete the type. We then 
> parse all ivars, methods, and everything else about a type. We also assist in 
> laying out the CXXRecordDecl by another callback 
> SymbolFileDWARF::LayoutRecordType (which is part of the 
> clang::ExternalASTSource). We need to assist in laying things out because the 
> DWARF debug info doesn't always include all required attributes or #pragma 
> information in order for us to create the types correctly. So this 
> SymbolFileDWARF::LayoutRecordType allows us to tell the compiler about the 
> offsets o!
 f ivars so they are always correct.
> 
> Back to origin decls: When running an expression we create a new ASTContext 
> that is for the expression only. decls are copied from the ASTContext for the 
> lldb_private::Module over into the ASTContext for the expression. When they 
> are copied, only a forward decls are copied, and they may need to be 
> completed. When this happens we might need to ask the type in the original 
> ASTContext to complete itself so that we can copy a complete definition over 
> into the expression ASTContext. This is the reason we track the origin decls. 
> Sometimes you have a type that is only a forward decl, and that is ok as we 
> don't always have the full definition of a class.
> 
> > * When an origin decl is looked up, should all the types in it be 
> > completed, or might it have incomplete types?  It seems as if there is code 
> > assuming that these types will always be complete.
> 
> There are two forms of incomplete types:
> 1 - incomplete types that have full definitions and just haven't been 
> completed (and might have to find the original decl, ask it to complete 
> itself, then copy the origin decl when the current decl needs to be copied 
> from one AST to another)
> 2 - types that are actually forward declarations and will be told they are 
> just forward decls
> 
> So we sometimes do run into cases where we don't have the debug info for 
> something because the compiler pulled it out trying to minimize the debug 
> info.
> 
> >
> > Context (warning, gets detailed, possibly with irrelevant details because 
> > newbie): lldb is crashing in clang::ASTContext::getASTRecordLayout with the 
> > assertion "Cannot get layout of forward declarations!".  The type in 
> > question is an incomplete type (string16, aka. basic_string<unsigned short, 
> > ...>).  Normally clang::ASTContext::getASTRecordLayout() would call 
> > getExternalSource()->CompleteType() to complete the type, but in this case 
> > it isn't because the type is marked as !hasExternalLexicalStorage().
> 
> That mean the type was not complete in the DWARF for the lldb_private::Module 
> it originates from.
> >
> > The *weird* thing is that the type has previously been completed, further 
> > up the stack, but in a different AST node (same name).  In more detail: 
> > Class A contains an instance of class B contains an instance of class C 
> > (==string16).  I'm seeing getASTRecordLayout called on class A, which then 
> > calls it (indirectly, though the EmptySubobjectMap construtor) on class B, 
> > which then calls it (ditto) on class C (all works).  Then the stack unwinds 
> > up to the B call, which proceeds to the Builder.Layout() line in that 
> > function.  It ends up (through the transformation mentioned above in 
> > clang::ClangASTSource::LayoutRecordType()) calling getASTRecordLayout() on 
> > the origin decl.  When it recurses down to class C, that node isn't 
> > complete, isn't completed, and causes an assertion.  So I'm trying to 
> > figure out whether the problem is that any decl hanging off an origin_decl 
> > should be complete, or that that node shouldn't be marked as 
> > !hasExternalLexicalStorage().  (Or something else; I'!
 ve already gone through several twists and turns debugging this problem :-}.)
> 
> We have a problem in the compiler currently where for classes like:
> 
> class A : public B
> {
>     ...
> }
> 
> The compiler says "ahh, you didn't use class B so I am not going to emit 
> debug info for it.". This really can hose us up because we now create a 
> ASTContext for the expression and we want a definition for "A" and the user 
> wants to call a method that is in class "B", but we can't because the 
> compiler removed the definition. What we currently do is figure out that we 
> have a forward declaration to "B" only, and when we create type "A" in the 
> module's ASTContext, we say "B" is an empty class with no ivars and no 
> methods. To fix this, you can specify "-fstandalone-debug" to the clang 
> compiler to tell it not to do this removal of debug info for things that are 
> inherited from.
> 
> 
> The other problem we have is say you two modules "foo.dylib" and "bar.dylib", 
> both have debug info, and "foo.dylib" has debug info with a complete "A" and 
> complete "B" definition, but "bar.dylib" has a complete "A" definition, but 
> only a forward "B" definition. The ASTContext for foo.dylib believes class 
> "A" to look like it really is, and "bar.dylib" has a definition for "A" that 
> believe it inherits from an empty class with no ivars and no methods. Now we 
> write and expression that uses a variable in "foo.dylib" whose type is "A" 
> and one from "bar.dylib" whose type is "A" and we try to copy the definitions 
> for "A" from the source ASTContext in "foo.dylib" over into the expression 
> AST (this works) and then we try to copy the version from "bar.dylib" into 
> the expression context and the AST copying code notices that the definitions 
> for class "A" don't match. The copy would have worked in the copies of "A" 
> are the same and nothing would have been copied, but it fails when they !
 are different. This is a know limitation of using the clang ASTContext classes 
to represent our types and is also the reason the "-fstandalone-debug" is the 
default setting for clang or Darwin, and probably should be for anyone else 
wanting to use lldb to debug.
> 
> So that sounds like it could be my situation (with A (defined in liba) 
> containing B (defined in libb) rather than inheriting from it, but I'd think 
> that'd be identical from a layout perspective).   But I'm not quite seeing 
> how that maps to the execution flow I'm seeing in my debugging.  If I 
> understand your description above correctly, what I was seeing was 
> CompleteType called on the forward decl of my A, and called successfully; 
> both A & B were fully populated.  But then later we got the origin decl for 
> A, and CompleteType was called on it, and B was not filled out in that.

If this is the case where A and B were complete in the source AST and copied to 
a destination AST and B wasn't able to be completed, it might be just a need to 
complete the inherited class B in the source AST prior to copying it to the 
dest AST. I would be very surprised if this is the issue though since we 
wouldn't be able to complete class A without first having completed class B in 
the source AST.

>  Is it that the first CompleteType was done in the expression ASTContext 
> (which presumably has access to search all the library ASTContexts) and the 
> second one was done in the context of the liba ASTContext, and so didn't have 
> access to the libb information?  And if so, why isn't the first one strictly 
> better?

So currently everything _only_ has visibility in their own AST when making 
types within an AST. So if liba has a complete A but a forward B, that is how 
the type would be represented in liba. When we are displaying a type later, we 
are able to grab the type from any AST if we know it is a forward decl, but if 
liba has a complete A and it inherits from a forward decl B, we will tell B 
within liba that it is complete and has no ivars or methods, otherwise the 
clang code that we use to build the module's AST will assert and kill your 
program because it is unhappy with class you are trying to create...

>  
> 
> > The crash is reproducible, but one of the reproduction steps is "Build 
> > chrome", so I figured I'd work on it some myself to teach myself lldb 
> > rather than try to file a bug on it.   The wisdom of that choice in 
> > question :-}.
> >
> > Any thoughts anyone has would be welcome.
> 
> So try things out with -fstandalone-debug and see if that fixes your 
> problems. If it does it gives us a work around for now, but we should really 
> be fixing any crashing bugs that occur due to this kind of issue in LLDB in 
> the long run.
> 
> Do you have a sense of what the proper fix would be?

Just make sure LLDB does the best it can with the information it is given. In 
the above case as described, if we have a full A and forward decl B, we end up 
with the notion that we have:

class B {};

class A : public B {
    ... all ivars and methods for A
};

So we lose debugging fidelity because all debug info for B is not around.

>  In the previous thread I think you indicated that the compiler should emit 
> debug information a la' -fstandalone-debug, and the linker should collapse 
> the information back down, but in this case it seems like the debugger should 
> be able to find the information in the other shared library (though I do 
> understand that there's a more general problem that doesn't solve, when the 
> debugging information isn't emitted anywhere for a particular class).

If there is a full definition for B _somewhere_ in liba, then we are good and 
this should work. If it isn't working this is the bug we need to fix. But if B 
is in another library like libb, then as far as we know for the type of A 
within liba, B is a forward declaration or just an empty base class.

Everything within a module is self contained, so all types are only derived 
from types from the current module. We have to keep things this way because you 
might unload libb.dylib and reload a newer version of libb.dylib. If we allowed 
modules to grab information from other modules, then we would have a large 
dependency graph to follow when a module is replaced... So if we copied a copy 
of B from libb.dylib before it was rebuilt, then we start debugging something 
that uses liba.dylib, and then libb.dylib get reloaded... Which version of "B" 
do you want if A hasn't been updated? The old "B" or the new "B"? And who is to 
say that the version of "B" that we imported from libb.dylib was correct in the 
first place? Maybe someone built liba.dylib when B looked like:

class B {
public:
    int m_int;
};

but libb.dylib was rebuilt so it now looks like:

class B {
public:
    int m_int[32];
};

But you still start a debug session with the liba.dylib that was built with the 
old B, but you pull in the debug info from the new libb.dylib.... You see where 
I am going with this? The only thing we can trust as far as debug information 
goes is the binary itself and its debug info. That guarantees we are as correct 
as possible, keeps us from having to try and track dependencies between modules.

One thing that is important to understand: when you display variables, we can 
pull information from any module. So if you have a class C:

class C {
public:
    B *m_b;
}

C c();


When we display this using "frame variable a" or using "expression a", when we 
try to display "B *m_b", we will ask the class B if it is a forward decl, and 
it is, the frame variable code will search all modules from the target we are 
using to debug (usually a couple of hundred different shared libraries) for the 
real definition of "B" and then use that when we try to expand "m_b" so we can 
view its ivars. So the variable display code knows how to always look for the 
real definition of things, but the type within each clang AST will only have 
visibility into its own module. 

One unfortunate side affect of having to complete "B" for class "A" when it 
looks like:

class A : public B {
    ... all ivars and methods for A
};

We told B it was complete and has no ivars or methods to keep clang happy to it 
doesn't assert and kill the debugger. So any other variables within that same 
module that have a "B *" ivar that was just a forward decl will think they have 
the complete definition of "B". Part of the solution to the issues you are 
running into is to mark the record decl for "B" in a  way that said "I had to 
complete this type by telling it that it has no ivars or methods, but it was 
really a forward decl". That way when we try to display a type C from above (if 
C comes from a module with a full A that inherits from a forward B), we know to 
still try and find the full definition of "B" from somewhere else.

I hope this clears up some of the reasons for the way things are and helps you 
understand more the scope of the problem.

Greg


_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Reply via email to