> On Apr 30, 2015, at 4:55 PM, David Blaikie <[email protected]> wrote:
> 
> 
> 
> On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <[email protected]> wrote:
>> 
>> > On Mar 19, 2015, at 5:37 PM, David Blaikie <[email protected]> wrote:
>> >
>> >
>> >
>> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <[email protected]> wrote:
>> >>
>> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie <[email protected]> wrote:
>> >> >
>> >> >
>> >> >
>> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul 
>> >> >> <[email protected]> wrote:
>> >> > Beyond the above (that using a new tag would mean this would go from 
>> >> > 'free' to 'not free' for GDB) having a new top level tag is pretty 
>> >> > substantial (we only have two at the moment, and with our talk of 
>> >> > modules being a "bag of dwarf" might go back to having one top level 
>> >> > tag? (it's not clear to me from DWARF4 whether DW_TAG_module is 
>> >> > currently a top-level tag, I don't think it is?)
>> >> >
>> >> >> The .debug_info section contains one or more compilation units, 
>> >> >> partial units, or in DWARF 5, type units.  DW_TAG_module isn't a unit, 
>> >> >> if you want it to be handled independently then it would need to be 
>> >> >> wrapped in a DW_TAG_partial_unit.  You would probably then use 
>> >> >> DW_TAG_imported_unit to refer to it, rather than 
>> >> >> DW_TAG_imported_module.
>> >> >>
>> >> >
>> >> > This makes a fair bit of sense - though the terminology's never going 
>> >> > to quite line up with modules, I suspect, and this would still require 
>> >> > modifying existing consumers (well, GDB) that can handle split-dwarf 
>> >> > today, I suspect (not sure how it'd handle partial_unit - maybe that 
>> >> > does work? - and still don't know how existing consumers would handle 
>> >> > imported_unit either - could be worth some testing, as it sounds sort 
>> >> > of right out of several less right options).
>> >>
>> >> Thanks for all the input so far!
>> >> To concretize this end of the discussion up let’s sketch some dwarf of 
>> >> how this could look like in practice.
>> >>
>> >> ELF (no imports)
>> >> ----------------
>> >>
>> >> On ELF or COFF a foo.c referencing types from the module Foundation looks 
>> >> like this:
>> >>
>> >> .debug_info:
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_name(“foo.c”)
>> >>
>> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat)
>> >>   DW_TAG_partial_unit
>> >
>> > For now I'd suggest we use compile_unit - that way it'll just work with 
>> > existing split-dwarf consumers. We can see about standardizing a top-level 
>> > DW_TAG_module or using DW_TAG_partial_unit here later, perhaps? I'm not 
>> > sure.
>> >
>> >>     
>> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >>
>> >>
>> >> Side question: Is .debug_info.dwo the right section to put the module 
>> >> skeleton in, or should it be a .debug_info section like normal fission 
>> >> skeletons?
>> >
>> > Skeletons go in .debug_info, the dwo sections are just for the .dwo file 
>> > (or the module file, in our new case - the extension isn't actually 
>> > important).
>> >
>> > It might be worth you compiling an example or two of split-dwarf to see 
>> > how this all works hands-on.
>> >
>> >> Mach-O (no comdat, no imports)
>> >> ------------------------------
>> >>
>> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not sure if that 
>> >> option is the best discriminator) this could look like:
>> >>
>> >> .debug_info:
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_name(“foo.c”)
>> >>   DW_TAG_partial_unit
>> >>     
>> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >>
>> >>
>> >> Mach-O (no comdat, with imports)
>> >> ------------------------------
>> >>
>> >> If we add the module import information to this, we get:
>> >>
>> >> .debug_info:
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_name(“foo.c”)
>> >>     DW_TAG_imported_module
>> >>       DW_AT_import(DW_FORM_ref_addr 0x10)
>> >
>> > Since we got went down the tangent of explaining split-dwarf many emails 
>> > ago, I've forgotten (& can't readily find) what we were discussing about 
>> > what ways the imported_module could work.
>> >
>> > The simplest representation I can think of would be to have it reference, 
>> > by signature, the module unit (whatever tag it uses) - DW_FORM_ref_sig8, 
>> > seems the simplest thing to do.
>> >
>> >>
>> >>   DW_TAG_partial_unit
>> >>     
>> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >>
>> >> 0x10:
>> >
>> > This is inside the partial unit? I figured we'd just put these attributes 
>> > on the top level (compile_unit, or whatever it might be later) - 
>> > potentially conditionalized on platform, sure.
>> >
>> >>     DW_TAG_module
>> >>       DW_AT_name(“Foundation”)
>> >>       DW_AT_LLVM_sysroot(“/“)
>> >>       DW_AT_LLVM_include_dir(“”)
>> >>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> >>       ...
>> >>
>> >>
>> >> ELF (comdat, with imports)
>> >> --------------------------
>> >>
>> >> But now let’s go back to ELF. Since the skeleton with the partial unit is 
>> >> comdat'd, I assume that this breaks the FORM_ref_addr used in the 
>> >> DW_AT_import. We could reuse the module hash as a signature for the 
>> >> module:
>> >>
>> >> .debug_info:
>> >>   DW_TAG_compile_unit
>> >>     DW_AT_name(“foo.c”)
>> >>     DW_TAG_imported_module
>> >>       DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE)
>> >
>> > Still only really need these imported_modules for lldb, right? I'd 
>> > consider having them off-by-default for non-darwin, but I'm not strictly 
>> > wedded to that notion. Wouldn't mind seeing size impact numbers of some 
>> > kind - if it's really fractional % increase & GDB doesn't fall over when 
>> > it sees them (in whatever FORM/tag/etc we decide on) then that's not the 
>> > end of the world.
>> >
>> > Just seems nice if the default mode is the nice, standard, split-dwarf 
>> > output. Doesn't need anything fancy.
>> >
>> >
>> >> .debug_info.dwo (group 0x1234ABCDE, comdat)
>> >>   DW_TAG_partial_unit
>> >>     
>> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>> >>     DW_AT_dwo_id(“0x1234ABCDE”)
>> >>
>> >>     DW_TAG_module
>> >>       DW_AT_signature(“0x1234ABCDE”)
>> >>       DW_AT_name(“Foundation”)
>> >
>> >
>> > The thing you haven't covered is the actual .dwo sections (.debug_info.dwo 
>> > (we'll probably need a simple stub compile_unit to make this correct 
>> > split-dwarf) and .debug_types.dwo being important - but all the supporting 
>> > .dwo sections will be necessary) that go in the module file.
>> >
>> >> This is bending the definition of DW_AT_signature, but I guess it could 
>> >> be made to work. Or we could say that for now, users have to choose 
>> >> between the comdat optimization and having the module imports recorded in 
>> >> Dwarf, since GDB wouldn’t know what to do with that information anyway.
>> 
>> Sorry for the long delay. Here’s a more complete example that should include 
>> all the suggestions made so far. For context I also included external type 
>> references in the example although admittedly this is a bit out of scope for 
>> this thread:
>> 
>> ELF (typeunits, comdats, with imports)
>> --------------------------------------
>> 
>> On ELF or COFF a bar.c referencing type Foo from the module FooLib looks 
>> like this:
>> 
>> bar.o
>> ~~~~~
>> 
>> // To keep this example focussed/readable, I'm assuming that bar.o itself 
>> was not compiled with fission.
>> .debug_info:
>>   DW_TAG_compile_unit
>>     DW_AT_name(“bar.c”)
>>     ...
>> 
>>     DW_TAG_imported_module // <- This could be optional on ELF.
>>       DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)
>> 
>>     DW_TAG_variable
>>       DW_AT_name(“MyFoo”)
>>       DW_AT_type [DW_FORM_ref4] 0x20
>> 0x20:
>>     DW_TAG_structure_type
>>       DW_AT_declaration (true)
>>       DW_AT_signature [DW_FORM_ref_sig8] (0xF00)
>> 
>> 
>> // Split DWARF skeleton CU for the module Foo.
>>   DW_TAG_compile_unit
>>     
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>     DW_AT_dwo_id(“0xFEDB9876”)
>>     ...
>> 
>> // Comdat’d partial unit containing the optional module descriptor.
>> .debug_info, group 0xABCD1234, comdat
>>   DW_TAG_partial_unit
>>     DW_TAG_module
>>       DW_AT_name(“FooLib”)
>>       DW_AT_LLVM_sysroot(“/“)
>>       DW_AT_LLVM_include_dirs(“-I/path”)
>>       DW_AT_LLVM_macros(“-DNDEBUG”)
>>       ...
>> 
>> FooLib-XYZ.pcm
>> ~~~~~~~~~~~~~~
>> 
>> .debug_info.dwo
>>   DW_TAG_compile_unit
>>     DW_AT_dwo_id(“0xFEDB9876”)
>>     ...
>> 
>> // Type unit for the type Foo.
>> .debug_types.dwo, group 0xF00, comdat
>>   DW_TAG_type_unit
>>     DW_TAG_structure_type
>>       DW_AT_name (“Foo”)
>>       ...
>> 
>> 
>> I think it awkward to have both the skeleton compile_unit in .debug_info and 
>> the partial_unit containing the TAG_module. Personally I’d prefer putting 
>> the TAG_module into the skeleton CU and then just refer to it via a 
>> FORM_ref_addr; but if we want to put the TAG_module into a comdat section, 
>> it looks like that’s what’s necessary.
> 
> It's been a while & I've probably lost all the context, but I think my 
> original theory was to have the skeleton compile_unit be comdat'd so they'd 
> deduplicate on linking (so we'd only have one reference to the module.dwo in 
> the linked binary). I don't recall there being a need for a separate 
> partial_unit - I imagine we'd just put the LLDB/LLVM extension attributes on 
> the skeleton compile_unit and expect debuggers that didn't understand them, 
> to ignore them.
> 
> Was there some reason this didn't work/make sense? Because you need a 
> DW_TAG_module to import with DW_TAG_imported_module?
Using DW_TAG_module was the best practice that was recommended on dwarf-discuss.

> If it turns out that's the right way to get a target for the imported_module, 
> we could put both the skeleton CU and the partial unit in the same comdat and 
> dedup them both together.

I think this works as long as we only have one TAG_module per .pcm file 
(because we need to refer to it via signature). But if we don’t mind having 
duplicate dwo_* references in the same .o file this would also work with more 
than one TAG_module (or submodules).


.debug_info:
 DW_TAG_compile_unit
   DW_AT_name(“bar.c”)
   ...

   DW_TAG_imported_module // <- This could be optional on ELF.
     DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876)

   ...

// Comdat’d split DWARF skeleton CU for the module Foo.
.debug_info, group 0xFEDB9876, comdat
 DW_TAG_compile_unit
   DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
   DW_AT_dwo_id(“0xFEDB9876”)
   ...

   DW_TAG_module
     DW_AT_name(“FooLib”)
     DW_AT_LLVM_sysroot(“/“)
     DW_AT_LLVM_include_dirs(“-I/path”)
     DW_AT_LLVM_macros(“-DNDEBUG”)
     ...


> 
> But this gets into complicated territory when the original binary is built 
> with fission... which will be relevant for modules on ELF with LLDB. Hmm, 
> maybe it's not too complicated - the partial_unit would end up in the .dwo 
> file (maybe we'd have to teach the .dwo file to deduplicate these too - the 
> same way it does for type units... - might require a new header to include 
> the hash, etc :/)... would be tricky to have the dwp tool resolve the 
> relocations to these things. Cross-unit references as you've got there aren't 
> something that every DWARF consumer is totally cool with, I don't think?

Ah. I thought the deduplication happens because all ELF sections sharing the 
same group are uniqued based on the group id. It certainly would be nice if we 
could avoid introducing a new .debug_info header...

> 
> Sort of inclined to have the imported module stuff just for LLDB, but I've 
> lost some of the context for that in the ensuing weeks.

-- adrian

>  
>> 
>> 
>> 
>> 
>> MachO (no typeunits, no comdats, with imports)
>> ----------------------------------------------
>> 
>> Since we don’t have comdat sections in Mach-O and we don’t have the tool 
>> support for type units, the way that external types can be referenced 
>> necessarily needs to be a bit different. The design that Greg and I came up 
>> with for Mach-O relies on llvm-dsymutil to fix up the DWARF for 
>> non-module-aware consumers. Just as ELF DWARF consumers need not be able to 
>> tell the difference between module debugging an split DWARF, on Mach-O the 
>> .dSYM bundle generated by llvm-dsymutil looks like traditional DWARF.
>> 
>> There are three differences in the DWARF output that make this possible:
>>   - Refer to external types by UID rather than by type signature.
>>     (This doubles as the key that allows a debugger to look import the type
>>      directly from the AST and protects us against hash collisions)
>>   - Add an index to the .o file that maps UID -> module file.
>>     (Fast lookup + UIDs for C and ObjC are only unique within a module)
>>   - Add an entry for each type’s UID to the types accelerator table.
>>     (Fast lookup)
>> 
>> bar.o
>> ~~~~~
>> 
>> .debug_info:
>>   DW_TAG_compile_unit
>>     DW_AT_name(“bar.c”)
>>     DW_TAG_imported_module
>>       DW_AT_import(DW_FORM_ref_addr 0x40)
>> 
>>     DW_TAG_variable
>>       DW_AT_name(“MyFoo”)
>>       DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”)  // We could use a custom FORM 
>> here
>> 
>>   // Skeleton unit.
>>   DW_TAG_compile_unit
>>     
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
>>     DW_AT_dwo_id(“0xFEDB9876”)
>>     ...
>> 0x40:
>>     DW_TAG_module
>>       DW_AT_name(“FooLib”)
>>       DW_AT_LLVM_sysroot(“/“)
>>       DW_AT_LLVM_include_dirs(“-I/path”)
>>       DW_AT_LLVM_macros(“-DNDEBUG”)
>> 
>> // This index uses the usual accelerator table format.
>> .apple_exttypes:
>> { “_ZTS3Foo” => debug_str offset of 
>> ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” }
>> 
>> FooLib-XYZ.pcm
>> ~~~~~~~~~~~~~~
>> 
>> .debug_info
>>   DW_TAG_compile_unit
>>     DW_AT_dwo_id(“0xFEDB9876”)
>> 
>> 0x80:
>>   DW_TAG_structure_type
>>     DW_AT_name (“Foo”)
>>     DW_AT_signature
>>     ...
>> 
>> // In addition to the entry for “Foo”, there is also an entry for the type’s 
>> UID “_ZTS3Foo” pointing to the type definition DIE.
>> .apple_types
>> { “Foo” => 0x80 }
>> { “_ZTS3Foo” => 0x80 }
>> 
>> 
>> 
>> When the debug info linker (llvm-dsymutil) is run, it first pulls in the 
>> .debug_info section from the clang module and fixes up all the DW_FORM_strp 
>> external type references by turning them into a DW_FORM_ref_addr that 
>> references the type in the DW_TAG_compile_unit pulled in from the module. To 
>> find the correct type DIE it looks up the UID in the .apple_exttypes index, 
>> finds the module, looks up the UID in the regular .apple_types accelerator 
>> table and replaces the temporary DW_FROM_strp with a DW_FORM_ref_addr (which 
>> incidentally takes up the same amount of space in the DIE).
>> 
>> 
>> Thoughts?
>> --
>> adrian
>> 
> 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to