> On Mar 19, 2015, at 5:37 PM, David Blaikie <[email protected]> wrote:
> 
> 
> 
> On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl <[email protected]> wrote:
>> 
>> > On Mar 16, 2015, at 2:55 PM, David Blaikie <[email protected]> wrote:
>> >
>> >
>> >
>> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul 
>> >> <[email protected]> wrote:
>> > Beyond the above (that using a new tag would mean this would go from 
>> > 'free' to 'not free' for GDB) having a new top level tag is pretty 
>> > substantial (we only have two at the moment, and with our talk of modules 
>> > being a "bag of dwarf" might go back to having one top level tag? (it's 
>> > not clear to me from DWARF4 whether DW_TAG_module is currently a top-level 
>> > tag, I don't think it is?)
>> >
>> >> The .debug_info section contains one or more compilation units, partial 
>> >> units, or in DWARF 5, type units.  DW_TAG_module isn't a unit, if you 
>> >> want it to be handled independently then it would need to be wrapped in a 
>> >> DW_TAG_partial_unit.  You would probably then use DW_TAG_imported_unit to 
>> >> refer to it, rather than DW_TAG_imported_module.
>> >>
>> >
>> > This makes a fair bit of sense - though the terminology's never going to 
>> > quite line up with modules, I suspect, and this would still require 
>> > modifying existing consumers (well, GDB) that can handle split-dwarf 
>> > today, I suspect (not sure how it'd handle partial_unit - maybe that does 
>> > work? - and still don't know how existing consumers would handle 
>> > imported_unit either - could be worth some testing, as it sounds sort of 
>> > right out of several less right options).
>> 
>> Thanks for all the input so far!
>> To concretize this end of the discussion up let’s sketch some dwarf of how 
>> this could look like in practice.
>> 
>> ELF (no imports)
>> ----------------
>> 
>> On ELF or COFF a foo.c referencing types from the module Foundation looks 
>> like this:
>> 
>> .debug_info:
>>   DW_TAG_compile_unit
>>     DW_AT_name(“foo.c”)
>> 
>> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat)
>>   DW_TAG_partial_unit
> 
> For now I'd suggest we use compile_unit - that way it'll just work with 
> existing split-dwarf consumers. We can see about standardizing a top-level 
> DW_TAG_module or using DW_TAG_partial_unit here later, perhaps? I'm not sure.
>  
>>     
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>     DW_AT_dwo_id(“0x1234ABCDE”)
>> 
>> 
>> Side question: Is .debug_info.dwo the right section to put the module 
>> skeleton in, or should it be a .debug_info section like normal fission 
>> skeletons?
> 
> Skeletons go in .debug_info, the dwo sections are just for the .dwo file (or 
> the module file, in our new case - the extension isn't actually important).
> 
> It might be worth you compiling an example or two of split-dwarf to see how 
> this all works hands-on.
>  
>> Mach-O (no comdat, no imports)
>> ------------------------------
>> 
>> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not sure if that 
>> option is the best discriminator) this could look like:
>> 
>> .debug_info:
>>   DW_TAG_compile_unit
>>     DW_AT_name(“foo.c”)
>>   DW_TAG_partial_unit
>>     
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>     DW_AT_dwo_id(“0x1234ABCDE”)
>> 
>> 
>> Mach-O (no comdat, with imports)
>> ------------------------------
>> 
>> If we add the module import information to this, we get:
>> 
>> .debug_info:
>>   DW_TAG_compile_unit
>>     DW_AT_name(“foo.c”)
>>     DW_TAG_imported_module
>>       DW_AT_import(DW_FORM_ref_addr 0x10)
> 
> Since we got went down the tangent of explaining split-dwarf many emails ago, 
> I've forgotten (& can't readily find) what we were discussing about what ways 
> the imported_module could work.
> 
> The simplest representation I can think of would be to have it reference, by 
> signature, the module unit (whatever tag it uses) - DW_FORM_ref_sig8, seems 
> the simplest thing to do.
>  
>> 
>>   DW_TAG_partial_unit
>>     
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>     DW_AT_dwo_id(“0x1234ABCDE”)
>> 
>> 0x10:
> 
> This is inside the partial unit? I figured we'd just put these attributes on 
> the top level (compile_unit, or whatever it might be later) - potentially 
> conditionalized on platform, sure.
>  
>>     DW_TAG_module
>>       DW_AT_name(“Foundation”)
>>       DW_AT_LLVM_sysroot(“/“)
>>       DW_AT_LLVM_include_dir(“”)
>>       DW_AT_LLVM_macros(“-DNDEBUG”)
>>       ...
>> 
>> 
>> ELF (comdat, with imports)
>> --------------------------
>> 
>> But now let’s go back to ELF. Since the skeleton with the partial unit is 
>> comdat'd, I assume that this breaks the FORM_ref_addr used in the 
>> DW_AT_import. We could reuse the module hash as a signature for the module:
>> 
>> .debug_info:
>>   DW_TAG_compile_unit
>>     DW_AT_name(“foo.c”)
>>     DW_TAG_imported_module
>>       DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE)
> 
> Still only really need these imported_modules for lldb, right? I'd consider 
> having them off-by-default for non-darwin, but I'm not strictly wedded to 
> that notion. Wouldn't mind seeing size impact numbers of some kind - if it's 
> really fractional % increase & GDB doesn't fall over when it sees them (in 
> whatever FORM/tag/etc we decide on) then that's not the end of the world.
> 
> Just seems nice if the default mode is the nice, standard, split-dwarf 
> output. Doesn't need anything fancy.
>  
> 
>> .debug_info.dwo (group 0x1234ABCDE, comdat)
>>   DW_TAG_partial_unit
>>     
>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”)
>>     DW_AT_dwo_id(“0x1234ABCDE”)
>> 
>>     DW_TAG_module
>>       DW_AT_signature(“0x1234ABCDE”)
>>       DW_AT_name(“Foundation”)
> 
> 
> The thing you haven't covered is the actual .dwo sections (.debug_info.dwo 
> (we'll probably need a simple stub compile_unit to make this correct 
> split-dwarf) and .debug_types.dwo being important - but all the supporting 
> .dwo sections will be necessary) that go in the module file.
>  
>> This is bending the definition of DW_AT_signature, but I guess it could be 
>> made to work. Or we could say that for now, users have to choose between the 
>> comdat optimization and having the module imports recorded in Dwarf, since 
>> GDB wouldn’t know what to do with that information anyway.

Sorry for the long delay. Here’s a more complete example that should include 
all the suggestions made so far. For context I also included external type 
references in the example although admittedly this is a bit out of scope for 
this thread:

ELF (typeunits, comdats, with imports)
--------------------------------------

On ELF or COFF a bar.c referencing type Foo from the module FooLib looks like 
this:

bar.o
~~~~~

// To keep this example focussed/readable, I'm assuming that bar.o itself was 
not compiled with fission.
.debug_info:
  DW_TAG_compile_unit
    DW_AT_name(“bar.c”)
    ...

    DW_TAG_imported_module // <- This could be optional on ELF.
      DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234)

    DW_TAG_variable
      DW_AT_name(“MyFoo”)
      DW_AT_type [DW_FORM_ref4] 0x20
0x20:
    DW_TAG_structure_type
      DW_AT_declaration (true)
      DW_AT_signature [DW_FORM_ref_sig8] (0xF00)


// Split DWARF skeleton CU for the module Foo.
  DW_TAG_compile_unit
    DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
    DW_AT_dwo_id(“0xFEDB9876”)
    ...

// Comdat’d partial unit containing the optional module descriptor.
.debug_info, group 0xABCD1234, comdat
  DW_TAG_partial_unit
    DW_TAG_module
      DW_AT_name(“FooLib”)
      DW_AT_LLVM_sysroot(“/“)
      DW_AT_LLVM_include_dirs(“-I/path”)
      DW_AT_LLVM_macros(“-DNDEBUG”)
      ...

FooLib-XYZ.pcm
~~~~~~~~~~~~~~

.debug_info.dwo
  DW_TAG_compile_unit
    DW_AT_dwo_id(“0xFEDB9876”)
    ...

// Type unit for the type Foo.
.debug_types.dwo, group 0xF00, comdat
  DW_TAG_type_unit
    DW_TAG_structure_type
      DW_AT_name (“Foo”)
      ...


I think it awkward to have both the skeleton compile_unit in .debug_info and 
the partial_unit containing the TAG_module. Personally I’d prefer putting the 
TAG_module into the skeleton CU and then just refer to it via a FORM_ref_addr; 
but if we want to put the TAG_module into a comdat section, it looks like 
that’s what’s necessary.




MachO (no typeunits, no comdats, with imports)
----------------------------------------------

Since we don’t have comdat sections in Mach-O and we don’t have the tool 
support for type units, the way that external types can be referenced 
necessarily needs to be a bit different. The design that Greg and I came up 
with for Mach-O relies on llvm-dsymutil to fix up the DWARF for 
non-module-aware consumers. Just as ELF DWARF consumers need not be able to 
tell the difference between module debugging an split DWARF, on Mach-O the 
.dSYM bundle generated by llvm-dsymutil looks like traditional DWARF.

There are three differences in the DWARF output that make this possible:
  - Refer to external types by UID rather than by type signature.
    (This doubles as the key that allows a debugger to look import the type
     directly from the AST and protects us against hash collisions)
  - Add an index to the .o file that maps UID -> module file.
    (Fast lookup + UIDs for C and ObjC are only unique within a module)
  - Add an entry for each type’s UID to the types accelerator table.
    (Fast lookup)

bar.o
~~~~~

.debug_info:
  DW_TAG_compile_unit
    DW_AT_name(“bar.c”)
    DW_TAG_imported_module
      DW_AT_import(DW_FORM_ref_addr 0x40)

    DW_TAG_variable
      DW_AT_name(“MyFoo”)
      DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”)  // We could use a custom FORM here

  // Skeleton unit.
  DW_TAG_compile_unit
    DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”)
    DW_AT_dwo_id(“0xFEDB9876”)
    ...
0x40:
    DW_TAG_module
      DW_AT_name(“FooLib”)
      DW_AT_LLVM_sysroot(“/“)
      DW_AT_LLVM_include_dirs(“-I/path”)
      DW_AT_LLVM_macros(“-DNDEBUG”)

// This index uses the usual accelerator table format.
.apple_exttypes:
{ “_ZTS3Foo” => debug_str offset of 
”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” }

FooLib-XYZ.pcm
~~~~~~~~~~~~~~

.debug_info
  DW_TAG_compile_unit
    DW_AT_dwo_id(“0xFEDB9876”)

0x80:
  DW_TAG_structure_type
    DW_AT_name (“Foo”)
    DW_AT_signature
    ...

// In addition to the entry for “Foo”, there is also an entry for the type’s 
UID “_ZTS3Foo” pointing to the type definition DIE.
.apple_types
{ “Foo” => 0x80 }
{ “_ZTS3Foo” => 0x80 }



When the debug info linker (llvm-dsymutil) is run, it first pulls in the 
.debug_info section from the clang module and fixes up all the DW_FORM_strp 
external type references by turning them into a DW_FORM_ref_addr that 
references the type in the DW_TAG_compile_unit pulled in from the module. To 
find the correct type DIE it looks up the UID in the .apple_exttypes index, 
finds the module, looks up the UID in the regular .apple_types accelerator 
table and replaces the temporary DW_FROM_strp with a DW_FORM_ref_addr (which 
incidentally takes up the same amount of space in the DIE).


Thoughts?
-- 
adrian
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to