> On Nov 25, 2014, at 5:25 PM, Adrian Prantl <apra...@apple.com> wrote: > >> >> On Nov 24, 2014, at 4:55 PM, Richard Smith <rich...@metafoo.co.uk >> <mailto:rich...@metafoo.co.uk>> wrote: >> >> On Fri, Nov 21, 2014 at 5:52 PM, Adrian Prantl <apra...@apple.com >> <mailto:apra...@apple.com>> wrote: >> Plans for module debugging >> ========================== >> >> I recently had a chat with Eric Christopher and David Blaikie to discuss >> ideas for debug info for Clang modules and this is what we came up with. >> >> Goals >> ----- >> >> Clang modules [1], (and their siblings C++ modules and precompiled header >> files) are a method for improving compile time by making the serialized AST >> for commonly-used headers files directly available to the compiler. >> >> Currently debug info is totally oblivious to this, when the developer >> compiles a file that uses a type from a module, clang simply emits a copy of >> the full definition (some exceptions apply for C++) of this type in DWARF >> into the debug info section of the resulting object file. That's a lot of >> copies. >> >> The key idea is to emit DWARF for types defined in modules only once, and >> then only emit references to these types in all the individual compile units >> that import this module. We are going to build on the split DWARF and type >> unit facilities provided by DWARF for this. DWARF consumers can follow the >> type references into module debug info section quite similar to how they >> resolve types in external type units today. Additionally, the format will >> allow consumers that support clang modules natively (such as LLDB) to >> directly look up types in the module, without having to go through the usual >> translation from AST to DWARF and back to AST. >> >> The primary benefit from doing all this is performance. This change is >> expected to reduce the size of the debug info in object files significantly >> by >> - emitting only references to the full types and thus >> - implicitly uniquing types that are defined in modules. >> The smaller object files will result in faster compile times and faster >> llvm::Module load times when doing LTO. The type uniquing will also result >> in significantly smaller debug info for the finished executables, especially >> for C and Objective-C, which do not support ODR-based type uniquing. This >> comes at the price of longer initial module build times, as debug info is >> emitted alongside the module. >> >> Design >> ------ >> >> Clang modules are designed to be ephemeral build artifacts that live in a >> shared module cache. Compiling a source file that imports `MyModule` results >> in `Module.pcm` to be generated to the module cache directory, which >> contains the serialized AST of the declarations found in the header files >> that comprise the module. >> >> We will change the binary clang module format to became a container (ELF, >> Mach-O, depending on the platform). Inside the container there will be >> multiple sections: one containing the serialized AST, and ones containing >> DWARF5 split debug type information for all types defined in the module that >> can be encoded in DWARF. By virtue of using type units, each type is emitted >> into its own type unit which can be identified via a unique type signature. >> DWARF consumers can use the type signatures to look up type definitions in >> the module debug info section. For module-aware consumers (LLDB), we will >> add an index that maps type signatures directly to an offset in the AST >> section. >> >> For an object file that was built using modules, we need to record the fact >> that a module has been imported. To this end, we add a DW_TAG_compile_unit >> into a COMDAT .debug_info.dwo section that references the split DWARF inside >> the module. Similar to split DWARF objects, the module will be identified by >> its filename and a checksum. The imported unit also contains a couple of >> extra attributes holding all the information necessary to recreate the >> module in case the module cache has been flushed. >> >> How does the debugging experience work in this case? When do you trigger the >> (possibly-lengthy) rebuild of the source in order to recreate the DWARF for >> the module (is it possible to delay that until the information is needed)? > > The module debugging scenario is primarily aimed at providing a better/faster > edit-compile-debug cycle. In this scenario, the module would most likely > still be in the cache. In a case were the binary was build so long ago that > the module cache has since been flushed it is generally more likely the the > user also used a DWARF linking step (such as dsymutil on Darwin, and maybe > dwz on Linux?) because they did a release/archive build which would just copy > the DWARF out of the module and store it alongside the binary. For this > reason I’m not very concerned about the time necessary for rebuilding the > module. But this is all very platform-specific, and different platforms may > need different defaults.
This description is in terms of building a module that has gone missing, but just to be clear: a modules-aware debugger probably also needs to rebuild modules that have gone out of date, such as when one of their headers is modified. > Delaying the module DWARF output until needed (maybe even by the debugger!) > is an interesting idea. We should definitely measure how expensive it is to > emit DWARF for an entire module with of types to see if this is worthwhile. > >> How much knowledge does the debugger have/need of Clang's modules to do >> this? Are we just embedding an arbitrary command that can be run to rebuild >> the .dwo if it's missing? And if so, how do we make that safe when (say) >> root attaches a debugger to an arbitrary process? > > I think it is reasonable to assume that a consumer that can make use of clang > modules also knows how to rebuild clang modules, which is why the example > only contained the name of the module, sysroot, include path, and defines; > not an arbitrary command. On platforms were the debugger does not understand > clang modules, the whole problem can be dodged by treating the modules as > explicit build artifacts. You are probably already aware, but you will need a bunch more information (language options, target options, header search options) to rebuild a module. > >> >> Platforms that treat modules as an explicit build artifact do not have this >> problem. In the .debug_info section all types that are defined in the module >> are referenced via their unique type signature using DW_FORM_ref_sig8, just >> as they would be if this were types from a regular DWARF type unit. >> >> Example >> ------- >> >> Let's say we have a module `MyModule` that defines a type `MyStruct`:: >> $ cat foo.c >> #include <MyModule.h> >> MyStruct x; >> >> when compiling `foo.c` like this:: >> clang -fmodules -gmodules foo.c -c >> >> clang produces `foo.o` and an ELF or Mach-O container for the module:: >> /path/to/module-cache/MyModule.pcm >> >> In the module container, we have a section for the serialized AST and a >> split DWARF sections for the debug type info. The exact format is likely >> still going to evolve a little, but this should give a rough idea:: >> >> MyModule.pcm: >> .debug_info.dwo: >> DW_TAG_compile_unit >> DW_AT_dwo_name ("/path/to/MyModule.pcm") >> DW_AT_dwo_id ([unique AST signature]) >> >> DW_TAG_type_unit ([hash for MyStruct]) >> DW_TAG_structure_type >> DW_AT_signature ([hash for MyStruct]) >> DW_AT_name “MyStruct” >> ... >> >> .debug_abbrev.dwo: >> // abbrevs referenced by .debug_info.dwo >> .debug_line.dwo: >> // filenames referenced by .debug_info.dwo >> .debug_str.dwo: >> // strings referenced by .debug_info.dwo >> >> .ast >> // Index at the top of the AST section sorted by hash value. >> [hash for MyStruct] -> [offset for MyStruct in this section] >> ... >> // Serialized AST follows >> ... >> >> The debug info in foo.o will look like this:: >> >> .debug_info.dwo >> DW_TAG_compile_unit >> // For DWARF consumers >> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm") >> DW_AT_dwo_id ([unique AST signature]) >> >> // For LLDB / dsymutil so they can recreate the module >> DW_AT_name “MyModule" >> DW_AT_LLVM_system_root "/" >> DW_AT_LLVM_preprocessor_defines "-DNDEBUG" >> DW_AT_LLVM_include_path "/path/to/MyModule.map" >> >> .debug_info >> DW_TAG_compile_unit >> DW_TAG_variable >> DW_AT_name "x" >> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct]) >> >> >> Type signatures >> --------------- >> >> We are going to deviate from the DWARF spec by using a more efficient >> hashing function that uses the type's unique mangled name and the name of >> the module as input. >> >> Why do you need/want the name of the module here? Modules are not a >> namespacing mechanism. How would you compute this name when the same type is >> defined in multiple imported modules? > > Great point! I’m mostly concerned about non-ODR languages ... >> >> For languages that do not have mangled type names or an ODR, >> >> The people working on C modules have expressed an intent to apply the ODR >> there too, so it's not clear that Clang modules will support any such >> language in the longer term. > > ... and this may be the answer to the question! > > +Doug: do Objective-C modules have an ODR? > >> >> we will use the unique identifiers produces by the clang indexer (USRs) as >> input instead. >> >> Extension: Replacing type units with a more efficient storage format >> -------------------------------------------------------------------- >> >> As an extension to this proposal, we are thinking of replacing the type >> units within the module debug info with a more efficient format: Instead of >> emitting each type into its own type unit (complete with its entire >> declcontext), it would be much more more efficient to emit one large bag of >> DWARF together with an index that maps hash values (type signatures) to DIE >> offsets. >> >> Next steps >> ---------- >> >> In order to implement this, the next steps would be as follows: >> 1. Change the clang module format to be an ELF/Mach-O container. >> 2. Teach clang to emit debug info for module types (e.g., by passing an >> empty compile unit with retained types to LLVM) into the module container. >> 3a. Add a -gmodules switch to clang that triggers the emission of type >> signatures for types coming from a module. >> >> Can you clarify what this flag would do? Does this turn on adding DWARF to >> the .pcm file? Does it turn off generating DWARF for imported modules in the >> current IR module? Both? > > It would emit references to the type from imported modules instead of the > types themselves. > Since the module cache is shared, we could — depending on just expensive this > is — turn on DWARF generation for .pcm files by default. I’d like to measure > this first, though. > >> >> I assume this means that the default remains that we build debug information >> for modules as if we didn't have modules (that is, put complete DWARF with >> the object code). Do you think that's the right long-term default? I think >> it's possibly not. > > I think you’re absolutely right about the long term. In the short term, it > may be better to have compatibility by default, but I don’t know what the > official LLVM policy on new features is, if there is one. > >> >> How does this interact with explicit module builds? Can I use a module built >> without -g in a compile that uses -g? And if I do, do I get complete debug >> information, or debug info just for the parts that aren't in the module? >> Does -gmodules let me choose between these? > > Personally I would expect old-style (full copy of the types) debug > information if I build agains a module that does not have embedded debug > information. > > thanks, > adrian >> >> 3b. Implement type-signature-based lookup in llvm-dsymutil and lldb. >> 4a. Emit an index that maps type signatures to AST section offsets into the >> module container. >> 4b. Implement direct loading of AST types in lldb. >> 5a. Improve the efficiency by replace type units in the module debug info >> with a lookup table that maps type signatures to DIE offsets. >> 5b. Support this format in lldb and llvm-dsymutil. >> >> Let me know what you think! >> >> cheers, >> Adrian >> >> [1] For more details about clang modules see >> http://clang.llvm.org/docs/Modules.html >> <http://clang.llvm.org/docs/Modules.html> and >> http://clang.llvm.org/docs/PCHInternals.html >> <http://clang.llvm.org/docs/PCHInternals.html> >> >> >> _______________________________________________ >> cfe-dev mailing list >> cfe-...@cs.uiuc.edu <mailto:cfe-...@cs.uiuc.edu> >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev> > _______________________________________________ > cfe-dev mailing list > cfe-...@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
_______________________________________________ lldb-dev mailing list lldb-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev