Re: [PATCH] Have clang list the imported modules in the debug info

Adrian Prantl Wed, 18 Mar 2015 17:23:16 -0700

> On Mar 18, 2015, at 5:03 PM, David Blaikie <[email protected]> wrote:
> 
> 
> 
> On Wed, Mar 18, 2015 at 4:53 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> On Mar 18, 2015, at 4:41 PM, David Blaikie <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>> On Wed, Mar 18, 2015 at 4:31 PM, Adrian Prantl <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> On Mar 18, 2015, at 4:02 PM, David Blaikie <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>> 
>>> On Wed, Mar 18, 2015 at 3:50 PM, Adrian Prantl <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> On Mar 17, 2015, at 6:44 PM, David Blaikie <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 17, 2015 at 3:47 PM, Adrian Prantl <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> > On Mar 17, 2015, at 10:03 AM, Greg Clayton <[email protected] 
>>>> > <mailto:[email protected]>> wrote:
>>>> >
>>>> >
>>>> >> On Mar 17, 2015, at 9:46 AM, David Blaikie <[email protected] 
>>>> >> <mailto:[email protected]>> wrote:
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Tue, Mar 17, 2015 at 9:42 AM, Greg Clayton <[email protected] 
>>>> >> <mailto:[email protected]>> wrote:
>>>> >>
>>>> >>> On Mar 16, 2015, at 6:47 PM, David Blaikie <[email protected] 
>>>> >>> <mailto:[email protected]>> wrote:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Mon, Mar 16, 2015 at 5:14 PM, Adrian Prantl <[email protected] 
>>>> >>> <mailto:[email protected]>> wrote:
>>>> >>>
>>>> >>> Thanks for the explanation David, I missed that it is entirely the 
>>>> >>> linker's (or some dwarf post-processor's) responsibility to find the 
>>>> >>> module files and link in the debug info from the .pcm files, so 
>>>> >>> debugger doesn’t notice a difference.
>>>> >>>
>>>> >>> I think there's still some confusion here. Sorry if I'm rehashing 
>>>> >>> something, but I'll try to explain how this all works.
>>>> >>>
>>>> >>> Normal split DWARF:
>>>> >>>
>>>> >>> Compiler generates two files: .o and .dwo.
>>>> >>> .dwo has static, non-relocatable debug info.
>>>> >>> .o has a skeleton compile_unit that has the name of the .dwo file and 
>>>> >>> a hash to verify that the .dwo file isn't stale when the debugger 
>>>> >>> reads it.
>>>> >>> The .o files are all linked together, the .dwo files stay where they 
>>>> >>> are.
>>>> >>> The debugger reads the linked executable, finds the skeleton 
>>>> >>> compile_units contained therein, and find/loads the .dwo files
>>>> >>>
>>>> >>> The scenario I have in mind for module debug info is this:
>>>> >>> Module is compiled as an object file with debug info (this file is 
>>>> >>> actually a .dwo file, even if it has some other extension - it has the 
>>>> >>> non-relocatable debug info in it)
>>>> >>> .o file has a comdat'd skeleton compile_unit describing the 
>>>> >>> .dwo/module file
>>>> >>> <from here on no extra work is required, the linker and debugger just 
>>>> >>> act as normal>
>>>> >>> The .o files are linked together, the skeleton compile_units get 
>>>> >>> deduplicated by the linker (comdat sections)
>>>> >>
>>>> >> One issue I can think of is we will need to figure out a way to make 
>>>> >> COMDAT work with mach-o. COMDAT requires large number of sections and 
>>>> >> mach-o can only have 255.
>>>> >>
>>>> >> Ah, fair enough - how does MachO handle inline functions (the most 
>>>> >> common use of comdat) currently, then?
>>>> >
>>>> > Currently mach-o relies on symbols in the symbol table being marked as 
>>>> > weak and I believe the data for these symbols are in special sections 
>>>> > that are marked as containing items that can be coalesced.
>>>> >
>>>> That’s not necessarily an issue that needs to be solved on Darwin, or am I 
>>>> maybe missing something? The linker leaves all debug info in the .o (as it 
>>>> currently does) and llvm-dsymutil is resolving all the external module 
>>>> type references while creating the .dSYM bundle.
>>>> 
>>>> Yeah, with a debug aware linker (or in the case of dsymutil, a debug-only 
>>>> linker) you would just know that since you're looking at object files, 
>>>> module references will be redundant across objects and should be 
>>>> deduplicated (by the dwo hash, most likely).
>>>> 
>>>> If you're not teaching your debugger to read modules, and want to link the 
>>>> debug info in from the .dwos - at that point you can probably drop the 
>>>> skeleton stuff entirely (you'd still need to teach your debugger about 
>>>> .dwo sections and some of the esoteric things there - like str_index and 
>>>> the extra/special line table just for file names (decl_file, etc, uses 
>>>> this)) and just put the contents of the module debug info straight in the 
>>>> dsym. It'd be a bit weird, but do-able without too much work, I'd imagine. 
>>>> You could move them back into the original sections, if you wanted to 
>>>> avoid the weird .dwo +non-.dwo sections together... *shrug* not sure what 
>>>> exactly you'd want there.
>>> 
>>> My plan was to have -gmodules to behave like the latter variant unless 
>>> -gsplit-dwarf is also present; this way there wouldn't be any weird 
>>> Darwin-specific code paths.
>>> 
>>> Not sure I quite follow (mostly my fault given the rambling paragraph up 
>>> there) - given the lack of a dsymutil-like tool on other platforms as part 
>>> of the common tool path for debug info, I'm not sure module debug info 
>>> without split dwarf is viable in that world. There's no tool to read these 
>>> extra files at any point.
>> 
>> In theory someone could port llvm-dsymutil to a different platform, but that 
>> scenario is a little far-fetched. I’m not sure what will happen if LLDB is 
>> presented with linked, non-split debug info that contains module references.
>> 
>> Linked non-split debug info should come out for free - all the debug info 
>> would be is a bunch of TUs in a single comdat - no skeleton CU, nothing 
>> else. It would look just like normal DWARF, except with one comdat instead 
>> of multiple, for each set of types from a module. (& there would be no real 
>> size gains - since you'd be redundantly including all the type information 
>> in every object file)
>>  
>> 
>>> 
>>> I suppose we could be creating one giant comdat for the module's debug info 
>>> (no skeleton unit, no distinct type unit comdats, just one big comdat). But 
>>> we'd probably want/need a tool to do the merging at compile time (like the 
>>> objcopy feature for split-dwarf, but in reverse - we'd compile, then run a 
>>> tool to smoosh all the comdats from the modules onto the object we just 
>>> generated). It wouldn't provide much in the way of space savings, a little 
>>> less stress on the linker (fewer comdats to handle), etc. Not sure if 
>>> there's a default mode of objcopy that would cope with this straight out, 
>>> or whether we'd need a new feature there (which wouldn't be a priority for 
>>> Google to implement, since we use fission, nor a priority for you to 
>>> implement since you have dsymutil, etc - so I'm not sure anyone would 
>>> bother)
>>> 
>>> Long story short: maybe just error on -gmodules if -gsplit-dwarf isn't 
>>> specified or the platform isn't darwin? (& if it's darwin, dsymutil could 
>>> read the module skeletons to find which modules to link into the .dSYM?)
>> 
>> That’s reasonable, too :-)
>> The plan is for llvm-dsymutil to follow the references in the module 
>> skeletons, copy the module CUs
>> 
>> TUs for now
>>  
>> into the .dSYM, and fixup the external type references to become 
>> DW_FORM_ref_addrs.
>> 
>> Sounds good for you guys - the fixup work will be a bit non-trivial, since 
>> it'll need to remove the type skeletons in the CUs, move all the extra 
>> members from the skeletons into the type unit (& resolve any duplicates), 
>> etc... - does that make sense? (otherwise I can provide some DWARF snippets 
>> to explain better)
> 
> Or we use a weird Darwin-specific code path to not emit the modules with 
> -generate-type-units in the first place (bag of DWARF+index mapping hash to 
> DIE),
> 
> bag-o-dwarf still doesn't address all the issues with type member merging I 
> described above. Certain things can't go in the type in the module because 
> they depend on context - most importantly/obviously, implicit special members 
> and member function template instatiations.
> 
> I suppose you could still have type references reference the type in the 
> bag-o-dwarf/type unit directly (DW_AT_type with DW_FORM_ref_sig8) while 
> having the partial type (the type declaration with its extra CU-specific 
> members) which would simplify the dwarf in the easy cases.


Yes, something along these lines would make a good first iteration.
>  
> which would make dsymutil's job really easy. As much as I’d like to get rid 
> of platform-specific behavior, due to the automatic way that modules are 
> generated on Darwin I don’t see an elegant way of making this switchable by 
> the user.
> 
> Not sure I quite follow here how implicit modules impact this functionality. 
> We can still have a flag that you pass to the compiler that dictates how 
> debug info in modules is created/what schema we use.

The problem is the combination of implicit generation and a global module 
cache. I guess we could treat a module with the wrong kind of debug info as out 
of date, but I’m not excited.

-- adrian

> 
> - David
>  
> 
> -- adrian
>>  
>> 
>> -- adrian
>> 
> 
>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Have clang list the imported modules in the debug info

Reply via email to