Re: [lldb-dev] [cfe-dev] Plans for module debugging

Ben Langmuir Mon, 01 Dec 2014 11:13:22 -0800

> On Nov 25, 2014, at 5:25 PM, Adrian Prantl <apra...@apple.com> wrote:
> 
>> 
>> On Nov 24, 2014, at 4:55 PM, Richard Smith <rich...@metafoo.co.uk 
>> <mailto:rich...@metafoo.co.uk>> wrote:
>> 
>> On Fri, Nov 21, 2014 at 5:52 PM, Adrian Prantl <apra...@apple.com 
>> <mailto:apra...@apple.com>> wrote:
>> Plans for module debugging
>> ==========================
>> 
>> I recently had a chat with Eric Christopher and David Blaikie to discuss 
>> ideas for debug info for Clang modules and this is what we came up with.
>> 
>> Goals
>> -----
>> 
>> Clang modules [1], (and their siblings C++ modules and precompiled header 
>> files) are a method for improving compile time by making the serialized AST 
>> for commonly-used headers files directly available to the compiler.
>> 
>> Currently debug info is totally oblivious to this, when the developer 
>> compiles a file that uses a type from a module, clang simply emits a copy of 
>> the full definition (some exceptions apply for C++) of this type in DWARF 
>> into the debug info section of the resulting object file. That's a lot of 
>> copies.
>> 
>> The key idea is to emit DWARF for types defined in modules only once, and 
>> then only emit references to these types in all the individual compile units 
>> that import this module. We are going to build on the split DWARF and type 
>> unit facilities provided by DWARF for this. DWARF consumers can follow the 
>> type references into module debug info section quite similar to how they 
>> resolve types in external type units today. Additionally, the format will 
>> allow consumers that support clang modules natively (such as LLDB) to 
>> directly look up types in the module, without having to go through the usual 
>> translation from AST to DWARF and back to AST.
>> 
>> The primary benefit from doing all this is performance. This change is 
>> expected to reduce the size of the debug info in object files significantly 
>> by
>> - emitting only references to the full types and thus
>> - implicitly uniquing types that are defined in modules.
>> The smaller object files will result in faster compile times and faster 
>> llvm::Module load times when doing LTO. The type uniquing will also result 
>> in significantly smaller debug info for the finished executables, especially 
>> for C and Objective-C, which do not support ODR-based type uniquing. This 
>> comes at the price of longer initial module build times, as debug info is 
>> emitted alongside the module.
>> 
>> Design
>> ------
>> 
>> Clang modules are designed to be ephemeral build artifacts that live in a 
>> shared module cache. Compiling a source file that imports `MyModule` results 
>> in `Module.pcm` to be generated to the module cache directory, which 
>> contains the serialized AST of the declarations found in the header files 
>> that comprise the module.
>> 
>> We will change the binary clang module format to became a container (ELF, 
>> Mach-O, depending on the platform). Inside the container there will be 
>> multiple sections: one containing the serialized AST, and ones containing 
>> DWARF5 split debug type information for all types defined in the module that 
>> can be encoded in DWARF. By virtue of using type units, each type is emitted 
>> into its own type unit which can be identified via a unique type signature. 
>> DWARF consumers can use the type signatures to look up type definitions in 
>> the module debug info section. For module-aware consumers (LLDB), we will 
>> add an index that maps type signatures directly to an offset in the AST 
>> section.
>> 
>> For an object file that was built using modules, we need to record the fact 
>> that a module has been imported. To this end, we add a DW_TAG_compile_unit 
>> into a COMDAT .debug_info.dwo section that references the split DWARF inside 
>> the module. Similar to split DWARF objects, the module will be identified by 
>> its filename and a checksum. The imported unit also contains a couple of 
>> extra attributes holding all the information necessary to recreate the 
>> module in case the module cache has been flushed.
>> 
>> How does the debugging experience work in this case? When do you trigger the 
>> (possibly-lengthy) rebuild of the source in order to recreate the DWARF for 
>> the module (is it possible to delay that until the information is needed)?
> 
> The module debugging scenario is primarily aimed at providing a better/faster 
> edit-compile-debug cycle. In this scenario, the module would most likely 
> still be in the cache. In a case were the binary was build so long ago that 
> the module cache has since been flushed it is generally more likely the the 
> user also used a DWARF linking step (such as dsymutil on Darwin, and maybe 
> dwz on Linux?) because they did a release/archive build which would just copy 
> the DWARF out of the module and store it alongside the binary. For this 
> reason I’m not very concerned about the time necessary for rebuilding the 
> module. But this is all very platform-specific, and different platforms may 
> need different defaults.


This description is in terms of building a module that has gone missing, but 
just to be clear: a modules-aware debugger probably also needs to rebuild 
modules that have gone out of date, such as when one of their headers is 
modified.

> Delaying the module DWARF output until needed (maybe even by the debugger!) 
> is an interesting idea. We should definitely measure how expensive it is to 
> emit DWARF for an entire module with of types to see if this is worthwhile.
> 
>> How much knowledge does the debugger have/need of Clang's modules to do 
>> this? Are we just embedding an arbitrary command that can be run to rebuild 
>> the .dwo if it's missing? And if so, how do we make that safe when (say) 
>> root attaches a debugger to an arbitrary process?
> 
> I think it is reasonable to assume that a consumer that can make use of clang 
> modules also knows how to rebuild clang modules, which is why the example 
> only contained the name of the module, sysroot, include path, and defines; 
> not an arbitrary command. On platforms were the debugger does not understand 
> clang modules, the whole problem can be dodged by treating the modules as 
> explicit build artifacts.

You are probably already aware, but you will need a bunch more information 
(language options, target options, header search options) to rebuild a module.

> 
>> 
>> Platforms that treat modules as an explicit build artifact do not have this 
>> problem. In the .debug_info section all types that are defined in the module 
>> are referenced via their unique type signature using DW_FORM_ref_sig8, just 
>> as they would be if this were types from a regular DWARF type unit.
>> 
>> Example
>> -------
>> 
>> Let's say we have a module `MyModule` that defines a type `MyStruct`::
>>  $ cat foo.c
>>  #include <MyModule.h>
>>  MyStruct x;
>> 
>> when compiling `foo.c` like this::
>>  clang -fmodules -gmodules foo.c -c
>> 
>> clang produces `foo.o` and an ELF or Mach-O container for the module::
>>  /path/to/module-cache/MyModule.pcm
>> 
>> In the module container, we have a section for the serialized AST and a 
>> split DWARF sections for the debug type info. The exact format is likely 
>> still going to evolve a little, but this should give a rough idea::
>> 
>>  MyModule.pcm:
>>   .debug_info.dwo:
>>     DW_TAG_compile_unit
>>       DW_AT_dwo_name ("/path/to/MyModule.pcm")
>>       DW_AT_dwo_id   ([unique AST signature])
>> 
>>     DW_TAG_type_unit ([hash for MyStruct])
>>        DW_TAG_structure_type
>>           DW_AT_signature ([hash for MyStruct])
>>           DW_AT_name “MyStruct”
>>           ...
>> 
>>   .debug_abbrev.dwo:
>>     // abbrevs referenced by .debug_info.dwo
>>   .debug_line.dwo:
>>     // filenames referenced by .debug_info.dwo
>>   .debug_str.dwo:
>>     // strings referenced by .debug_info.dwo
>> 
>>   .ast
>>     // Index at the top of the AST section sorted by hash value.
>>     [hash for MyStruct] -> [offset for MyStruct in this section]
>>     ...
>>     // Serialized AST follows
>>     ...
>> 
>> The debug info in foo.o will look like this::
>> 
>>  .debug_info.dwo
>>    DW_TAG_compile_unit
>>       // For DWARF consumers
>>       DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm")
>>       DW_AT_dwo_id   ([unique AST signature])
>> 
>>       // For LLDB / dsymutil so they can recreate the module
>>       DW_AT_name “MyModule"
>>       DW_AT_LLVM_system_root "/"
>>       DW_AT_LLVM_preprocessor_defines  "-DNDEBUG"
>>       DW_AT_LLVM_include_path "/path/to/MyModule.map"
>> 
>>  .debug_info
>>    DW_TAG_compile_unit
>>      DW_TAG_variable
>>        DW_AT_name "x"
>>        DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct])
>> 
>> 
>> Type signatures
>> ---------------
>> 
>> We are going to deviate from the DWARF spec by using a more efficient 
>> hashing function that uses the type's unique mangled name and the name of 
>> the module as input.
>> 
>> Why do you need/want the name of the module here? Modules are not a 
>> namespacing mechanism. How would you compute this name when the same type is 
>> defined in multiple imported modules?
> 
> Great point! I’m mostly concerned about non-ODR languages ...
>> 
>> For languages that do not have mangled type names or an ODR,
>> 
>> The people working on C modules have expressed an intent to apply the ODR 
>> there too, so it's not clear that Clang modules will support any such 
>> language in the longer term.
> 
> ... and this may be the answer to the question!
> 
> +Doug: do Objective-C modules have an ODR?
> 
>> 
>> we will use the unique identifiers produces by the clang indexer (USRs) as 
>> input instead.
>> 
>> Extension: Replacing type units with a more efficient storage format
>> --------------------------------------------------------------------
>> 
>> As an extension to this proposal, we are thinking of replacing the type 
>> units within the module debug info with a more efficient format: Instead of 
>> emitting each type into its own type unit (complete with its entire 
>> declcontext), it would be much more more efficient to emit one large bag of 
>> DWARF together with an index that maps hash values (type signatures) to DIE 
>> offsets.
>> 
>> Next steps
>> ----------
>> 
>> In order to implement this, the next steps would be as follows:
>> 1. Change the clang module format to be an ELF/Mach-O container.
>> 2. Teach clang to emit debug info for module types (e.g., by passing an 
>> empty compile unit with retained types to LLVM) into the module container.
>> 3a. Add a -gmodules switch to clang that triggers the emission of type 
>> signatures for types coming from a module.
>> 
>> Can you clarify what this flag would do? Does this turn on adding DWARF to 
>> the .pcm file? Does it turn off generating DWARF for imported modules in the 
>> current IR module? Both?
> 
> It would emit references to the type from imported modules instead of the 
> types themselves.
> Since the module cache is shared, we could — depending on just expensive this 
> is — turn on DWARF generation for .pcm files by default. I’d like to measure 
> this first, though.
> 
>> 
>> I assume this means that the default remains that we build debug information 
>> for modules as if we didn't have modules (that is, put complete DWARF with 
>> the object code). Do you think that's the right long-term default? I think 
>> it's possibly not.
> 
> I think you’re absolutely right about the long term. In the short term, it 
> may be better to have compatibility by default, but I don’t know what the 
> official LLVM policy on new features is, if there is one.
> 
>> 
>> How does this interact with explicit module builds? Can I use a module built 
>> without -g in a compile that uses -g? And if I do, do I get complete debug 
>> information, or debug info just for the parts that aren't in the module? 
>> Does -gmodules let me choose between these?
> 
> Personally I would expect old-style (full copy of the types) debug 
> information if I build agains a module that does not have embedded debug 
> information.
> 
> thanks,
> adrian
>> 
>> 3b. Implement type-signature-based lookup in llvm-dsymutil and lldb.
>> 4a. Emit an index that maps type signatures to AST section offsets into the 
>> module container.
>> 4b. Implement direct loading of AST types in lldb.
>> 5a. Improve the efficiency by replace type units in the module debug info 
>> with a lookup table that maps type signatures to DIE offsets.
>> 5b. Support this format in lldb and llvm-dsymutil.
>> 
>> Let me know what you think!
>> 
>> cheers,
>> Adrian
>> 
>> [1] For more details about clang modules see
>> http://clang.llvm.org/docs/Modules.html 
>> <http://clang.llvm.org/docs/Modules.html> and
>> http://clang.llvm.org/docs/PCHInternals.html 
>> <http://clang.llvm.org/docs/PCHInternals.html>
>> 
>> 
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-...@cs.uiuc.edu <mailto:cfe-...@cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev 
>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>
> _______________________________________________
> cfe-dev mailing list
> cfe-...@cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

_______________________________________________
lldb-dev mailing list
lldb-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Re: [lldb-dev] [cfe-dev] Plans for module debugging

Reply via email to