> On Dec 1, 2014, at 10:57 AM, Adrian Prantl <apra...@apple.com> wrote: > >> >> On Dec 1, 2014, at 10:50 AM, Ben Langmuir <blangm...@apple.com >> <mailto:blangm...@apple.com>> wrote: >> >> >>> On Dec 1, 2014, at 10:41 AM, Adrian Prantl <apra...@apple.com >>> <mailto:apra...@apple.com>> wrote: >>> >>>> >>>> On Dec 1, 2014, at 10:32 AM, Adrian Prantl <apra...@apple.com >>>> <mailto:apra...@apple.com>> wrote: >>>> >>>> >>>>> On Dec 1, 2014, at 10:27 AM, Ben Langmuir <blangm...@apple.com >>>>> <mailto:blangm...@apple.com>> wrote: >>>>> >>>>> >>>>>> On Nov 25, 2014, at 5:25 PM, Adrian Prantl <apra...@apple.com >>>>>> <mailto:apra...@apple.com>> wrote: >>>>>> >>>>>>> >>>>>>> On Nov 24, 2014, at 4:55 PM, Richard Smith <rich...@metafoo.co.uk >>>>>>> <mailto:rich...@metafoo.co.uk>> wrote: >>>>>>> >>>>>>> On Fri, Nov 21, 2014 at 5:52 PM, Adrian Prantl <apra...@apple.com >>>>>>> <mailto:apra...@apple.com>> wrote: >>>>>>> Plans for module debugging >>>>>>> ========================== >>>>>>> >>>>>>> I recently had a chat with Eric Christopher and David Blaikie to >>>>>>> discuss ideas for debug info for Clang modules and this is what we came >>>>>>> up with. >>>>>>> >>>>>>> Goals >>>>>>> ----- >>>>>>> >>>>>>> Clang modules [1], (and their siblings C++ modules and precompiled >>>>>>> header files) are a method for improving compile time by making the >>>>>>> serialized AST for commonly-used headers files directly available to >>>>>>> the compiler. >>>>>>> >>>>>>> Currently debug info is totally oblivious to this, when the developer >>>>>>> compiles a file that uses a type from a module, clang simply emits a >>>>>>> copy of the full definition (some exceptions apply for C++) of this >>>>>>> type in DWARF into the debug info section of the resulting object file. >>>>>>> That's a lot of copies. >>>>>>> >>>>>>> The key idea is to emit DWARF for types defined in modules only once, >>>>>>> and then only emit references to these types in all the individual >>>>>>> compile units that import this module. We are going to build on the >>>>>>> split DWARF and type unit facilities provided by DWARF for this. DWARF >>>>>>> consumers can follow the type references into module debug info section >>>>>>> quite similar to how they resolve types in external type units today. >>>>>>> Additionally, the format will allow consumers that support clang >>>>>>> modules natively (such as LLDB) to directly look up types in the >>>>>>> module, without having to go through the usual translation from AST to >>>>>>> DWARF and back to AST. >>>>>>> >>>>>>> The primary benefit from doing all this is performance. This change is >>>>>>> expected to reduce the size of the debug info in object files >>>>>>> significantly by >>>>>>> - emitting only references to the full types and thus >>>>>>> - implicitly uniquing types that are defined in modules. >>>>>>> The smaller object files will result in faster compile times and faster >>>>>>> llvm::Module load times when doing LTO. The type uniquing will also >>>>>>> result in significantly smaller debug info for the finished >>>>>>> executables, especially for C and Objective-C, which do not support >>>>>>> ODR-based type uniquing. This comes at the price of longer initial >>>>>>> module build times, as debug info is emitted alongside the module. >>>>>>> >>>>>>> Design >>>>>>> ------ >>>>>>> >>>>>>> Clang modules are designed to be ephemeral build artifacts that live in >>>>>>> a shared module cache. Compiling a source file that imports `MyModule` >>>>>>> results in `Module.pcm` to be generated to the module cache directory, >>>>>>> which contains the serialized AST of the declarations found in the >>>>>>> header files that comprise the module. >>>>>>> >>>>>>> We will change the binary clang module format to became a container >>>>>>> (ELF, Mach-O, depending on the platform). Inside the container there >>>>>>> will be multiple sections: one containing the serialized AST, and ones >>>>>>> containing DWARF5 split debug type information for all types defined in >>>>>>> the module that can be encoded in DWARF. By virtue of using type units, >>>>>>> each type is emitted into its own type unit which can be identified via >>>>>>> a unique type signature. DWARF consumers can use the type signatures to >>>>>>> look up type definitions in the module debug info section. For >>>>>>> module-aware consumers (LLDB), we will add an index that maps type >>>>>>> signatures directly to an offset in the AST section. >>>>>>> >>>>>>> For an object file that was built using modules, we need to record the >>>>>>> fact that a module has been imported. To this end, we add a >>>>>>> DW_TAG_compile_unit into a COMDAT .debug_info.dwo section that >>>>>>> references the split DWARF inside the module. Similar to split DWARF >>>>>>> objects, the module will be identified by its filename and a checksum. >>>>>>> The imported unit also contains a couple of extra attributes holding >>>>>>> all the information necessary to recreate the module in case the module >>>>>>> cache has been flushed. >>>>>>> >>>>>>> How does the debugging experience work in this case? When do you >>>>>>> trigger the (possibly-lengthy) rebuild of the source in order to >>>>>>> recreate the DWARF for the module (is it possible to delay that until >>>>>>> the information is needed)? >>>>>> >>>>>> The module debugging scenario is primarily aimed at providing a >>>>>> better/faster edit-compile-debug cycle. In this scenario, the module >>>>>> would most likely still be in the cache. In a case were the binary was >>>>>> build so long ago that the module cache has since been flushed it is >>>>>> generally more likely the the user also used a DWARF linking step (such >>>>>> as dsymutil on Darwin, and maybe dwz on Linux?) because they did a >>>>>> release/archive build which would just copy the DWARF out of the module >>>>>> and store it alongside the binary. For this reason I’m not very >>>>>> concerned about the time necessary for rebuilding the module. But this >>>>>> is all very platform-specific, and different platforms may need >>>>>> different defaults. >>>>> >>>>> This description is in terms of building a module that has gone missing, >>>>> but just to be clear: a modules-aware debugger probably also needs to >>>>> rebuild modules that have gone out of date, such as when one of their >>>>> headers is modified. >>>> >>>> In this case were the module is out of date, the debugger should probably >>>> fall back to the DWARF types, because it cannot guarantee that the >>>> modifications to the header files did not change the types it wants to >>>> look up. >>> >>> Sorry, I just realized that this doesn’t make any sense if the DWARF is >>> stored in the module. The behavior should be: >>> 1. If the module is missing, recreate the module. >>> 2. If the module signature does not match the signature in the .o file, >>> either print a large warning that types from that module may be bogus, or >>> categorically refuse to use them. >> >> Maybe this is described elsewhere, but what is the “signature” being used >> here? Assuming it depends on the detailed contents of the serialized AST: >> currently ASTWriter output is nondeterministic and things like the ID#s for >> identifiers, types, etc. will change every time you build the module; until >> that gets fixed, we would always hit case (2). > > I was actually hoping that we could rely on deterministic output from clang. > If it is infeasible make ASTWriter output deterministic, we can fall back to > something like the DWARF dwo_id signature here.
I think everyone agrees that deterministic output is a good idea. Last I heard, Richard had indicated some interest in tackling this problem. Ben > > -- adrian > >> >>> >>> For long-term debugging users are expected to use a DWARF linker (dsymutil, >>> dwz), which archives all types in a future-proof format (DWARF). >>> >>> -- adrian >>> >>>> >>>>> >>>>>> Delaying the module DWARF output until needed (maybe even by the >>>>>> debugger!) is an interesting idea. We should definitely measure how >>>>>> expensive it is to emit DWARF for an entire module with of types to see >>>>>> if this is worthwhile. >>>>>> >>>>>>> How much knowledge does the debugger have/need of Clang's modules to do >>>>>>> this? Are we just embedding an arbitrary command that can be run to >>>>>>> rebuild the .dwo if it's missing? And if so, how do we make that safe >>>>>>> when (say) root attaches a debugger to an arbitrary process? >>>>>> >>>>>> I think it is reasonable to assume that a consumer that can make use of >>>>>> clang modules also knows how to rebuild clang modules, which is why the >>>>>> example only contained the name of the module, sysroot, include path, >>>>>> and defines; not an arbitrary command. On platforms were the debugger >>>>>> does not understand clang modules, the whole problem can be dodged by >>>>>> treating the modules as explicit build artifacts. >>>>> >>>>> You are probably already aware, but you will need a bunch more >>>>> information (language options, target options, header search options) to >>>>> rebuild a module. >>>> >>>> Thanks, language options and target options were absent from the list >>>> previously! >>>> >>>> -- adrian >>>>> >>>>>> >>>>>>> >>>>>>> Platforms that treat modules as an explicit build artifact do not have >>>>>>> this problem. In the .debug_info section all types that are defined in >>>>>>> the module are referenced via their unique type signature using >>>>>>> DW_FORM_ref_sig8, just as they would be if this were types from a >>>>>>> regular DWARF type unit. >>>>>>> >>>>>>> Example >>>>>>> ------- >>>>>>> >>>>>>> Let's say we have a module `MyModule` that defines a type `MyStruct`:: >>>>>>> $ cat foo.c >>>>>>> #include <MyModule.h> >>>>>>> MyStruct x; >>>>>>> >>>>>>> when compiling `foo.c` like this:: >>>>>>> clang -fmodules -gmodules foo.c -c >>>>>>> >>>>>>> clang produces `foo.o` and an ELF or Mach-O container for the module:: >>>>>>> /path/to/module-cache/MyModule.pcm >>>>>>> >>>>>>> In the module container, we have a section for the serialized AST and a >>>>>>> split DWARF sections for the debug type info. The exact format is >>>>>>> likely still going to evolve a little, but this should give a rough >>>>>>> idea:: >>>>>>> >>>>>>> MyModule.pcm: >>>>>>> .debug_info.dwo: >>>>>>> DW_TAG_compile_unit >>>>>>> DW_AT_dwo_name ("/path/to/MyModule.pcm") >>>>>>> DW_AT_dwo_id ([unique AST signature]) >>>>>>> >>>>>>> DW_TAG_type_unit ([hash for MyStruct]) >>>>>>> DW_TAG_structure_type >>>>>>> DW_AT_signature ([hash for MyStruct]) >>>>>>> DW_AT_name “MyStruct” >>>>>>> ... >>>>>>> >>>>>>> .debug_abbrev.dwo: >>>>>>> // abbrevs referenced by .debug_info.dwo >>>>>>> .debug_line.dwo: >>>>>>> // filenames referenced by .debug_info.dwo >>>>>>> .debug_str.dwo: >>>>>>> // strings referenced by .debug_info.dwo >>>>>>> >>>>>>> .ast >>>>>>> // Index at the top of the AST section sorted by hash value. >>>>>>> [hash for MyStruct] -> [offset for MyStruct in this section] >>>>>>> ... >>>>>>> // Serialized AST follows >>>>>>> ... >>>>>>> >>>>>>> The debug info in foo.o will look like this:: >>>>>>> >>>>>>> .debug_info.dwo >>>>>>> DW_TAG_compile_unit >>>>>>> // For DWARF consumers >>>>>>> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm") >>>>>>> DW_AT_dwo_id ([unique AST signature]) >>>>>>> >>>>>>> // For LLDB / dsymutil so they can recreate the module >>>>>>> DW_AT_name “MyModule" >>>>>>> DW_AT_LLVM_system_root "/" >>>>>>> DW_AT_LLVM_preprocessor_defines "-DNDEBUG" >>>>>>> DW_AT_LLVM_include_path "/path/to/MyModule.map" >>>>>>> >>>>>>> .debug_info >>>>>>> DW_TAG_compile_unit >>>>>>> DW_TAG_variable >>>>>>> DW_AT_name "x" >>>>>>> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct]) >>>>>>> >>>>>>> >>>>>>> Type signatures >>>>>>> --------------- >>>>>>> >>>>>>> We are going to deviate from the DWARF spec by using a more efficient >>>>>>> hashing function that uses the type's unique mangled name and the name >>>>>>> of the module as input. >>>>>>> >>>>>>> Why do you need/want the name of the module here? Modules are not a >>>>>>> namespacing mechanism. How would you compute this name when the same >>>>>>> type is defined in multiple imported modules? >>>>>> >>>>>> Great point! I’m mostly concerned about non-ODR languages ... >>>>>>> >>>>>>> For languages that do not have mangled type names or an ODR, >>>>>>> >>>>>>> The people working on C modules have expressed an intent to apply the >>>>>>> ODR there too, so it's not clear that Clang modules will support any >>>>>>> such language in the longer term. >>>>>> >>>>>> ... and this may be the answer to the question! >>>>>> >>>>>> +Doug: do Objective-C modules have an ODR? >>>>>> >>>>>>> >>>>>>> we will use the unique identifiers produces by the clang indexer (USRs) >>>>>>> as input instead. >>>>>>> >>>>>>> Extension: Replacing type units with a more efficient storage format >>>>>>> -------------------------------------------------------------------- >>>>>>> >>>>>>> As an extension to this proposal, we are thinking of replacing the type >>>>>>> units within the module debug info with a more efficient format: >>>>>>> Instead of emitting each type into its own type unit (complete with its >>>>>>> entire declcontext), it would be much more more efficient to emit one >>>>>>> large bag of DWARF together with an index that maps hash values (type >>>>>>> signatures) to DIE offsets. >>>>>>> >>>>>>> Next steps >>>>>>> ---------- >>>>>>> >>>>>>> In order to implement this, the next steps would be as follows: >>>>>>> 1. Change the clang module format to be an ELF/Mach-O container. >>>>>>> 2. Teach clang to emit debug info for module types (e.g., by passing an >>>>>>> empty compile unit with retained types to LLVM) into the module >>>>>>> container. >>>>>>> 3a. Add a -gmodules switch to clang that triggers the emission of type >>>>>>> signatures for types coming from a module. >>>>>>> >>>>>>> Can you clarify what this flag would do? Does this turn on adding DWARF >>>>>>> to the .pcm file? Does it turn off generating DWARF for imported >>>>>>> modules in the current IR module? Both? >>>>>> >>>>>> It would emit references to the type from imported modules instead of >>>>>> the types themselves. >>>>>> Since the module cache is shared, we could — depending on just expensive >>>>>> this is — turn on DWARF generation for .pcm files by default. I’d like >>>>>> to measure this first, though. >>>>>> >>>>>>> >>>>>>> I assume this means that the default remains that we build debug >>>>>>> information for modules as if we didn't have modules (that is, put >>>>>>> complete DWARF with the object code). Do you think that's the right >>>>>>> long-term default? I think it's possibly not. >>>>>> >>>>>> I think you’re absolutely right about the long term. In the short term, >>>>>> it may be better to have compatibility by default, but I don’t know what >>>>>> the official LLVM policy on new features is, if there is one. >>>>>> >>>>>>> >>>>>>> How does this interact with explicit module builds? Can I use a module >>>>>>> built without -g in a compile that uses -g? And if I do, do I get >>>>>>> complete debug information, or debug info just for the parts that >>>>>>> aren't in the module? Does -gmodules let me choose between these? >>>>>> >>>>>> Personally I would expect old-style (full copy of the types) debug >>>>>> information if I build agains a module that does not have embedded debug >>>>>> information. >>>>>> >>>>>> thanks, >>>>>> adrian >>>>>>> >>>>>>> 3b. Implement type-signature-based lookup in llvm-dsymutil and lldb. >>>>>>> 4a. Emit an index that maps type signatures to AST section offsets into >>>>>>> the module container. >>>>>>> 4b. Implement direct loading of AST types in lldb. >>>>>>> 5a. Improve the efficiency by replace type units in the module debug >>>>>>> info with a lookup table that maps type signatures to DIE offsets. >>>>>>> 5b. Support this format in lldb and llvm-dsymutil. >>>>>>> >>>>>>> Let me know what you think! >>>>>>> >>>>>>> cheers, >>>>>>> Adrian >>>>>>> >>>>>>> [1] For more details about clang modules see >>>>>>> http://clang.llvm.org/docs/Modules.html >>>>>>> <http://clang.llvm.org/docs/Modules.html> and >>>>>>> http://clang.llvm.org/docs/PCHInternals.html >>>>>>> <http://clang.llvm.org/docs/PCHInternals.html> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> cfe-dev mailing list >>>>>>> cfe-...@cs.uiuc.edu <mailto:cfe-...@cs.uiuc.edu> >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >>>>>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev> >>>>>> _______________________________________________ >>>>>> cfe-dev mailing list >>>>>> cfe-...@cs.uiuc.edu <mailto:cfe-...@cs.uiuc.edu> >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >>>>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>
_______________________________________________ lldb-dev mailing list lldb-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev