> On Dec 1, 2014, at 10:32 AM, Adrian Prantl <apra...@apple.com> wrote: > >> >> On Dec 1, 2014, at 10:27 AM, Ben Langmuir <blangm...@apple.com >> <mailto:blangm...@apple.com>> wrote: >> >> >>> On Nov 25, 2014, at 5:25 PM, Adrian Prantl <apra...@apple.com >>> <mailto:apra...@apple.com>> wrote: >>> >>>> >>>> On Nov 24, 2014, at 4:55 PM, Richard Smith <rich...@metafoo.co.uk >>>> <mailto:rich...@metafoo.co.uk>> wrote: >>>> >>>> On Fri, Nov 21, 2014 at 5:52 PM, Adrian Prantl <apra...@apple.com >>>> <mailto:apra...@apple.com>> wrote: >>>> Plans for module debugging >>>> ========================== >>>> >>>> I recently had a chat with Eric Christopher and David Blaikie to discuss >>>> ideas for debug info for Clang modules and this is what we came up with. >>>> >>>> Goals >>>> ----- >>>> >>>> Clang modules [1], (and their siblings C++ modules and precompiled header >>>> files) are a method for improving compile time by making the serialized >>>> AST for commonly-used headers files directly available to the compiler. >>>> >>>> Currently debug info is totally oblivious to this, when the developer >>>> compiles a file that uses a type from a module, clang simply emits a copy >>>> of the full definition (some exceptions apply for C++) of this type in >>>> DWARF into the debug info section of the resulting object file. That's a >>>> lot of copies. >>>> >>>> The key idea is to emit DWARF for types defined in modules only once, and >>>> then only emit references to these types in all the individual compile >>>> units that import this module. We are going to build on the split DWARF >>>> and type unit facilities provided by DWARF for this. DWARF consumers can >>>> follow the type references into module debug info section quite similar to >>>> how they resolve types in external type units today. Additionally, the >>>> format will allow consumers that support clang modules natively (such as >>>> LLDB) to directly look up types in the module, without having to go >>>> through the usual translation from AST to DWARF and back to AST. >>>> >>>> The primary benefit from doing all this is performance. This change is >>>> expected to reduce the size of the debug info in object files >>>> significantly by >>>> - emitting only references to the full types and thus >>>> - implicitly uniquing types that are defined in modules. >>>> The smaller object files will result in faster compile times and faster >>>> llvm::Module load times when doing LTO. The type uniquing will also result >>>> in significantly smaller debug info for the finished executables, >>>> especially for C and Objective-C, which do not support ODR-based type >>>> uniquing. This comes at the price of longer initial module build times, as >>>> debug info is emitted alongside the module. >>>> >>>> Design >>>> ------ >>>> >>>> Clang modules are designed to be ephemeral build artifacts that live in a >>>> shared module cache. Compiling a source file that imports `MyModule` >>>> results in `Module.pcm` to be generated to the module cache directory, >>>> which contains the serialized AST of the declarations found in the header >>>> files that comprise the module. >>>> >>>> We will change the binary clang module format to became a container (ELF, >>>> Mach-O, depending on the platform). Inside the container there will be >>>> multiple sections: one containing the serialized AST, and ones containing >>>> DWARF5 split debug type information for all types defined in the module >>>> that can be encoded in DWARF. By virtue of using type units, each type is >>>> emitted into its own type unit which can be identified via a unique type >>>> signature. DWARF consumers can use the type signatures to look up type >>>> definitions in the module debug info section. For module-aware consumers >>>> (LLDB), we will add an index that maps type signatures directly to an >>>> offset in the AST section. >>>> >>>> For an object file that was built using modules, we need to record the >>>> fact that a module has been imported. To this end, we add a >>>> DW_TAG_compile_unit into a COMDAT .debug_info.dwo section that references >>>> the split DWARF inside the module. Similar to split DWARF objects, the >>>> module will be identified by its filename and a checksum. The imported >>>> unit also contains a couple of extra attributes holding all the >>>> information necessary to recreate the module in case the module cache has >>>> been flushed. >>>> >>>> How does the debugging experience work in this case? When do you trigger >>>> the (possibly-lengthy) rebuild of the source in order to recreate the >>>> DWARF for the module (is it possible to delay that until the information >>>> is needed)? >>> >>> The module debugging scenario is primarily aimed at providing a >>> better/faster edit-compile-debug cycle. In this scenario, the module would >>> most likely still be in the cache. In a case were the binary was build so >>> long ago that the module cache has since been flushed it is generally more >>> likely the the user also used a DWARF linking step (such as dsymutil on >>> Darwin, and maybe dwz on Linux?) because they did a release/archive build >>> which would just copy the DWARF out of the module and store it alongside >>> the binary. For this reason I’m not very concerned about the time necessary >>> for rebuilding the module. But this is all very platform-specific, and >>> different platforms may need different defaults. >> >> This description is in terms of building a module that has gone missing, but >> just to be clear: a modules-aware debugger probably also needs to rebuild >> modules that have gone out of date, such as when one of their headers is >> modified. > > In this case were the module is out of date, the debugger should probably > fall back to the DWARF types, because it cannot guarantee that the > modifications to the header files did not change the types it wants to look > up.
Are you also worried about this when the debugger builds a module that has gone missing? At that point there is nothing to fall back to, and rebuilding the module could produce incorrect information. > >> >>> Delaying the module DWARF output until needed (maybe even by the debugger!) >>> is an interesting idea. We should definitely measure how expensive it is to >>> emit DWARF for an entire module with of types to see if this is worthwhile. >>> >>>> How much knowledge does the debugger have/need of Clang's modules to do >>>> this? Are we just embedding an arbitrary command that can be run to >>>> rebuild the .dwo if it's missing? And if so, how do we make that safe when >>>> (say) root attaches a debugger to an arbitrary process? >>> >>> I think it is reasonable to assume that a consumer that can make use of >>> clang modules also knows how to rebuild clang modules, which is why the >>> example only contained the name of the module, sysroot, include path, and >>> defines; not an arbitrary command. On platforms were the debugger does not >>> understand clang modules, the whole problem can be dodged by treating the >>> modules as explicit build artifacts. >> >> You are probably already aware, but you will need a bunch more information >> (language options, target options, header search options) to rebuild a >> module. > > Thanks, language options and target options were absent from the list > previously! > > -- adrian >> >>> >>>> >>>> Platforms that treat modules as an explicit build artifact do not have >>>> this problem. In the .debug_info section all types that are defined in the >>>> module are referenced via their unique type signature using >>>> DW_FORM_ref_sig8, just as they would be if this were types from a regular >>>> DWARF type unit. >>>> >>>> Example >>>> ------- >>>> >>>> Let's say we have a module `MyModule` that defines a type `MyStruct`:: >>>> $ cat foo.c >>>> #include <MyModule.h> >>>> MyStruct x; >>>> >>>> when compiling `foo.c` like this:: >>>> clang -fmodules -gmodules foo.c -c >>>> >>>> clang produces `foo.o` and an ELF or Mach-O container for the module:: >>>> /path/to/module-cache/MyModule.pcm >>>> >>>> In the module container, we have a section for the serialized AST and a >>>> split DWARF sections for the debug type info. The exact format is likely >>>> still going to evolve a little, but this should give a rough idea:: >>>> >>>> MyModule.pcm: >>>> .debug_info.dwo: >>>> DW_TAG_compile_unit >>>> DW_AT_dwo_name ("/path/to/MyModule.pcm") >>>> DW_AT_dwo_id ([unique AST signature]) >>>> >>>> DW_TAG_type_unit ([hash for MyStruct]) >>>> DW_TAG_structure_type >>>> DW_AT_signature ([hash for MyStruct]) >>>> DW_AT_name “MyStruct” >>>> ... >>>> >>>> .debug_abbrev.dwo: >>>> // abbrevs referenced by .debug_info.dwo >>>> .debug_line.dwo: >>>> // filenames referenced by .debug_info.dwo >>>> .debug_str.dwo: >>>> // strings referenced by .debug_info.dwo >>>> >>>> .ast >>>> // Index at the top of the AST section sorted by hash value. >>>> [hash for MyStruct] -> [offset for MyStruct in this section] >>>> ... >>>> // Serialized AST follows >>>> ... >>>> >>>> The debug info in foo.o will look like this:: >>>> >>>> .debug_info.dwo >>>> DW_TAG_compile_unit >>>> // For DWARF consumers >>>> DW_AT_dwo_name ("/path/to/module-cache/MyModule.pcm") >>>> DW_AT_dwo_id ([unique AST signature]) >>>> >>>> // For LLDB / dsymutil so they can recreate the module >>>> DW_AT_name “MyModule" >>>> DW_AT_LLVM_system_root "/" >>>> DW_AT_LLVM_preprocessor_defines "-DNDEBUG" >>>> DW_AT_LLVM_include_path "/path/to/MyModule.map" >>>> >>>> .debug_info >>>> DW_TAG_compile_unit >>>> DW_TAG_variable >>>> DW_AT_name "x" >>>> DW_AT_type (DW_FORM_ref_sig8) ([hash for MyStruct]) >>>> >>>> >>>> Type signatures >>>> --------------- >>>> >>>> We are going to deviate from the DWARF spec by using a more efficient >>>> hashing function that uses the type's unique mangled name and the name of >>>> the module as input. >>>> >>>> Why do you need/want the name of the module here? Modules are not a >>>> namespacing mechanism. How would you compute this name when the same type >>>> is defined in multiple imported modules? >>> >>> Great point! I’m mostly concerned about non-ODR languages ... >>>> >>>> For languages that do not have mangled type names or an ODR, >>>> >>>> The people working on C modules have expressed an intent to apply the ODR >>>> there too, so it's not clear that Clang modules will support any such >>>> language in the longer term. >>> >>> ... and this may be the answer to the question! >>> >>> +Doug: do Objective-C modules have an ODR? >>> >>>> >>>> we will use the unique identifiers produces by the clang indexer (USRs) as >>>> input instead. >>>> >>>> Extension: Replacing type units with a more efficient storage format >>>> -------------------------------------------------------------------- >>>> >>>> As an extension to this proposal, we are thinking of replacing the type >>>> units within the module debug info with a more efficient format: Instead >>>> of emitting each type into its own type unit (complete with its entire >>>> declcontext), it would be much more more efficient to emit one large bag >>>> of DWARF together with an index that maps hash values (type signatures) to >>>> DIE offsets. >>>> >>>> Next steps >>>> ---------- >>>> >>>> In order to implement this, the next steps would be as follows: >>>> 1. Change the clang module format to be an ELF/Mach-O container. >>>> 2. Teach clang to emit debug info for module types (e.g., by passing an >>>> empty compile unit with retained types to LLVM) into the module container. >>>> 3a. Add a -gmodules switch to clang that triggers the emission of type >>>> signatures for types coming from a module. >>>> >>>> Can you clarify what this flag would do? Does this turn on adding DWARF to >>>> the .pcm file? Does it turn off generating DWARF for imported modules in >>>> the current IR module? Both? >>> >>> It would emit references to the type from imported modules instead of the >>> types themselves. >>> Since the module cache is shared, we could — depending on just expensive >>> this is — turn on DWARF generation for .pcm files by default. I’d like to >>> measure this first, though. >>> >>>> >>>> I assume this means that the default remains that we build debug >>>> information for modules as if we didn't have modules (that is, put >>>> complete DWARF with the object code). Do you think that's the right >>>> long-term default? I think it's possibly not. >>> >>> I think you’re absolutely right about the long term. In the short term, it >>> may be better to have compatibility by default, but I don’t know what the >>> official LLVM policy on new features is, if there is one. >>> >>>> >>>> How does this interact with explicit module builds? Can I use a module >>>> built without -g in a compile that uses -g? And if I do, do I get complete >>>> debug information, or debug info just for the parts that aren't in the >>>> module? Does -gmodules let me choose between these? >>> >>> Personally I would expect old-style (full copy of the types) debug >>> information if I build agains a module that does not have embedded debug >>> information. >>> >>> thanks, >>> adrian >>>> >>>> 3b. Implement type-signature-based lookup in llvm-dsymutil and lldb. >>>> 4a. Emit an index that maps type signatures to AST section offsets into >>>> the module container. >>>> 4b. Implement direct loading of AST types in lldb. >>>> 5a. Improve the efficiency by replace type units in the module debug info >>>> with a lookup table that maps type signatures to DIE offsets. >>>> 5b. Support this format in lldb and llvm-dsymutil. >>>> >>>> Let me know what you think! >>>> >>>> cheers, >>>> Adrian >>>> >>>> [1] For more details about clang modules see >>>> http://clang.llvm.org/docs/Modules.html >>>> <http://clang.llvm.org/docs/Modules.html> and >>>> http://clang.llvm.org/docs/PCHInternals.html >>>> <http://clang.llvm.org/docs/PCHInternals.html> >>>> >>>> >>>> _______________________________________________ >>>> cfe-dev mailing list >>>> cfe-...@cs.uiuc.edu <mailto:cfe-...@cs.uiuc.edu> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >>>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev> >>> _______________________________________________ >>> cfe-dev mailing list >>> cfe-...@cs.uiuc.edu <mailto:cfe-...@cs.uiuc.edu> >>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev >>> <http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>
_______________________________________________ lldb-dev mailing list lldb-dev@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev