> >> On May 4, 2015, at 11:38 AM, David Blaikie <[email protected]> wrote: >> >> >> >> On Mon, May 4, 2015 at 11:24 AM, Adrian Prantl <[email protected]> wrote: >> >>> On May 4, 2015, at 10:53 AM, David Blaikie <[email protected]> wrote: >>> >>> >>> >>> On Fri, May 1, 2015 at 8:52 PM, Adrian Prantl <[email protected]> wrote: >>>> >>>>> On May 1, 2015, at 5:25 PM, David Blaikie <[email protected]> wrote: >>>>> >>>>> >>>>> >>>>> On Fri, May 1, 2015 at 5:19 PM, Adrian Prantl <[email protected]> wrote: >>>>> >>>>>> On May 1, 2015, at 4:55 PM, David Blaikie <[email protected]> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, May 1, 2015 at 4:39 PM, Adrian Prantl <[email protected]> wrote: >>>>>> >>>>>> > On May 1, 2015, at 10:01 AM, David Blaikie <[email protected]> wrote: >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Fri, May 1, 2015 at 9:52 AM, Adrian Prantl <[email protected]> >>>>>> > wrote: >>>>>> >> >>>>>> >>> On May 1, 2015, at 9:23 AM, David Blaikie <[email protected]> wrote: >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> On Thu, Apr 30, 2015 at 5:21 PM, Adrian Prantl <[email protected]> >>>>>> >>> wrote: >>>>>> >>> >>>>>> >>> > On Apr 30, 2015, at 4:55 PM, David Blaikie <[email protected]> >>>>>> >>> > wrote: >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > On Thu, Apr 30, 2015 at 4:31 PM, Adrian Prantl <[email protected]> >>>>>> >>> > wrote: >>>>>> >>> >> >>>>>> >>> >> > On Mar 19, 2015, at 5:37 PM, David Blaikie <[email protected]> >>>>>> >>> >> > wrote: >>>>>> >>> >> > >>>>>> >>> >> > >>>>>> >>> >> > >>>>>> >>> >> > On Thu, Mar 19, 2015 at 5:24 PM, Adrian Prantl >>>>>> >>> >> > <[email protected]> wrote: >>>>>> >>> >> >> >>>>>> >>> >> >> > On Mar 16, 2015, at 2:55 PM, David Blaikie >>>>>> >>> >> >> > <[email protected]> wrote: >>>>>> >>> >> >> > >>>>>> >>> >> >> > >>>>>> >>> >> >> > >>>>>> >>> >> >> >> On Mon, Mar 16, 2015 at 2:45 PM, Robinson, Paul >>>>>> >>> >> >> >> <[email protected]> wrote: >>>>>> >>> >> >> > Beyond the above (that using a new tag would mean this would >>>>>> >>> >> >> > go from 'free' to 'not free' for GDB) having a new top level >>>>>> >>> >> >> > tag is pretty substantial (we only have two at the moment, >>>>>> >>> >> >> > and with our talk of modules being a "bag of dwarf" might go >>>>>> >>> >> >> > back to having one top level tag? (it's not clear to me from >>>>>> >>> >> >> > DWARF4 whether DW_TAG_module is currently a top-level tag, I >>>>>> >>> >> >> > don't think it is?) >>>>>> >>> >> >> > >>>>>> >>> >> >> >> The .debug_info section contains one or more compilation >>>>>> >>> >> >> >> units, partial units, or in DWARF 5, type units. >>>>>> >>> >> >> >> DW_TAG_module isn't a unit, if you want it to be handled >>>>>> >>> >> >> >> independently then it would need to be wrapped in a >>>>>> >>> >> >> >> DW_TAG_partial_unit. You would probably then use >>>>>> >>> >> >> >> DW_TAG_imported_unit to refer to it, rather than >>>>>> >>> >> >> >> DW_TAG_imported_module. >>>>>> >>> >> >> >> >>>>>> >>> >> >> > >>>>>> >>> >> >> > This makes a fair bit of sense - though the terminology's >>>>>> >>> >> >> > never going to quite line up with modules, I suspect, and >>>>>> >>> >> >> > this would still require modifying existing consumers (well, >>>>>> >>> >> >> > GDB) that can handle split-dwarf today, I suspect (not sure >>>>>> >>> >> >> > how it'd handle partial_unit - maybe that does work? - and >>>>>> >>> >> >> > still don't know how existing consumers would handle >>>>>> >>> >> >> > imported_unit either - could be worth some testing, as it >>>>>> >>> >> >> > sounds sort of right out of several less right options). >>>>>> >>> >> >> >>>>>> >>> >> >> Thanks for all the input so far! >>>>>> >>> >> >> To concretize this end of the discussion up let’s sketch some >>>>>> >>> >> >> dwarf of how this could look like in practice. >>>>>> >>> >> >> >>>>>> >>> >> >> ELF (no imports) >>>>>> >>> >> >> ---------------- >>>>>> >>> >> >> >>>>>> >>> >> >> On ELF or COFF a foo.c referencing types from the module >>>>>> >>> >> >> Foundation looks like this: >>>>>> >>> >> >> >>>>>> >>> >> >> .debug_info: >>>>>> >>> >> >> DW_TAG_compile_unit >>>>>> >>> >> >> DW_AT_name(“foo.c”) >>>>>> >>> >> >> >>>>>> >>> >> >> .debug_info.dwo (on ELF: group 0x1234ABCDE, comdat) >>>>>> >>> >> >> DW_TAG_partial_unit >>>>>> >>> >> > >>>>>> >>> >> > For now I'd suggest we use compile_unit - that way it'll just >>>>>> >>> >> > work with existing split-dwarf consumers. We can see about >>>>>> >>> >> > standardizing a top-level DW_TAG_module or using >>>>>> >>> >> > DW_TAG_partial_unit here later, perhaps? I'm not sure. >>>>>> >>> >> > >>>>>> >>> >> >> >>>>>> >>> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >>>>>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >>>>>> >>> >> >> >>>>>> >>> >> >> >>>>>> >>> >> >> Side question: Is .debug_info.dwo the right section to put the >>>>>> >>> >> >> module skeleton in, or should it be a .debug_info section like >>>>>> >>> >> >> normal fission skeletons? >>>>>> >>> >> > >>>>>> >>> >> > Skeletons go in .debug_info, the dwo sections are just for the >>>>>> >>> >> > .dwo file (or the module file, in our new case - the extension >>>>>> >>> >> > isn't actually important). >>>>>> >>> >> > >>>>>> >>> >> > It might be worth you compiling an example or two of >>>>>> >>> >> > split-dwarf to see how this all works hands-on. >>>>>> >>> >> > >>>>>> >>> >> >> Mach-O (no comdat, no imports) >>>>>> >>> >> >> ------------------------------ >>>>>> >>> >> >> >>>>>> >>> >> >> Mach-O doesn’t do comdat, so with -split-dwarf=Disable (not >>>>>> >>> >> >> sure if that option is the best discriminator) this could look >>>>>> >>> >> >> like: >>>>>> >>> >> >> >>>>>> >>> >> >> .debug_info: >>>>>> >>> >> >> DW_TAG_compile_unit >>>>>> >>> >> >> DW_AT_name(“foo.c”) >>>>>> >>> >> >> DW_TAG_partial_unit >>>>>> >>> >> >> >>>>>> >>> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >>>>>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >>>>>> >>> >> >> >>>>>> >>> >> >> >>>>>> >>> >> >> Mach-O (no comdat, with imports) >>>>>> >>> >> >> ------------------------------ >>>>>> >>> >> >> >>>>>> >>> >> >> If we add the module import information to this, we get: >>>>>> >>> >> >> >>>>>> >>> >> >> .debug_info: >>>>>> >>> >> >> DW_TAG_compile_unit >>>>>> >>> >> >> DW_AT_name(“foo.c”) >>>>>> >>> >> >> DW_TAG_imported_module >>>>>> >>> >> >> DW_AT_import(DW_FORM_ref_addr 0x10) >>>>>> >>> >> > >>>>>> >>> >> > Since we got went down the tangent of explaining split-dwarf >>>>>> >>> >> > many emails ago, I've forgotten (& can't readily find) what we >>>>>> >>> >> > were discussing about what ways the imported_module could work. >>>>>> >>> >> > >>>>>> >>> >> > The simplest representation I can think of would be to have it >>>>>> >>> >> > reference, by signature, the module unit (whatever tag it uses) >>>>>> >>> >> > - DW_FORM_ref_sig8, seems the simplest thing to do. >>>>>> >>> >> > >>>>>> >>> >> >> >>>>>> >>> >> >> DW_TAG_partial_unit >>>>>> >>> >> >> >>>>>> >>> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >>>>>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >>>>>> >>> >> >> >>>>>> >>> >> >> 0x10: >>>>>> >>> >> > >>>>>> >>> >> > This is inside the partial unit? I figured we'd just put these >>>>>> >>> >> > attributes on the top level (compile_unit, or whatever it might >>>>>> >>> >> > be later) - potentially conditionalized on platform, sure. >>>>>> >>> >> > >>>>>> >>> >> >> DW_TAG_module >>>>>> >>> >> >> DW_AT_name(“Foundation”) >>>>>> >>> >> >> DW_AT_LLVM_sysroot(“/“) >>>>>> >>> >> >> DW_AT_LLVM_include_dir(“”) >>>>>> >>> >> >> DW_AT_LLVM_macros(“-DNDEBUG”) >>>>>> >>> >> >> ... >>>>>> >>> >> >> >>>>>> >>> >> >> >>>>>> >>> >> >> ELF (comdat, with imports) >>>>>> >>> >> >> -------------------------- >>>>>> >>> >> >> >>>>>> >>> >> >> But now let’s go back to ELF. Since the skeleton with the >>>>>> >>> >> >> partial unit is comdat'd, I assume that this breaks the >>>>>> >>> >> >> FORM_ref_addr used in the DW_AT_import. We could reuse the >>>>>> >>> >> >> module hash as a signature for the module: >>>>>> >>> >> >> >>>>>> >>> >> >> .debug_info: >>>>>> >>> >> >> DW_TAG_compile_unit >>>>>> >>> >> >> DW_AT_name(“foo.c”) >>>>>> >>> >> >> DW_TAG_imported_module >>>>>> >>> >> >> DW_AT_import(DW_FORM_ref_addr 0x1234ABCDE) >>>>>> >>> >> > >>>>>> >>> >> > Still only really need these imported_modules for lldb, right? >>>>>> >>> >> > I'd consider having them off-by-default for non-darwin, but I'm >>>>>> >>> >> > not strictly wedded to that notion. Wouldn't mind seeing size >>>>>> >>> >> > impact numbers of some kind - if it's really fractional % >>>>>> >>> >> > increase & GDB doesn't fall over when it sees them (in whatever >>>>>> >>> >> > FORM/tag/etc we decide on) then that's not the end of the world. >>>>>> >>> >> > >>>>>> >>> >> > Just seems nice if the default mode is the nice, standard, >>>>>> >>> >> > split-dwarf output. Doesn't need anything fancy. >>>>>> >>> >> > >>>>>> >>> >> > >>>>>> >>> >> >> .debug_info.dwo (group 0x1234ABCDE, comdat) >>>>>> >>> >> >> DW_TAG_partial_unit >>>>>> >>> >> >> >>>>>> >>> >> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/Foundation.pcm”) >>>>>> >>> >> >> DW_AT_dwo_id(“0x1234ABCDE”) >>>>>> >>> >> >> >>>>>> >>> >> >> DW_TAG_module >>>>>> >>> >> >> DW_AT_signature(“0x1234ABCDE”) >>>>>> >>> >> >> DW_AT_name(“Foundation”) >>>>>> >>> >> > >>>>>> >>> >> > >>>>>> >>> >> > The thing you haven't covered is the actual .dwo sections >>>>>> >>> >> > (.debug_info.dwo (we'll probably need a simple stub >>>>>> >>> >> > compile_unit to make this correct split-dwarf) and >>>>>> >>> >> > .debug_types.dwo being important - but all the supporting .dwo >>>>>> >>> >> > sections will be necessary) that go in the module file. >>>>>> >>> >> > >>>>>> >>> >> >> This is bending the definition of DW_AT_signature, but I guess >>>>>> >>> >> >> it could be made to work. Or we could say that for now, users >>>>>> >>> >> >> have to choose between the comdat optimization and having the >>>>>> >>> >> >> module imports recorded in Dwarf, since GDB wouldn’t know what >>>>>> >>> >> >> to do with that information anyway. >>>>>> >>> >> >>>>>> >>> >> Sorry for the long delay. Here’s a more complete example that >>>>>> >>> >> should include all the suggestions made so far. For context I >>>>>> >>> >> also included external type references in the example although >>>>>> >>> >> admittedly this is a bit out of scope for this thread: >>>>>> >>> >> >>>>>> >>> >> ELF (typeunits, comdats, with imports) >>>>>> >>> >> -------------------------------------- >>>>>> >>> >> >>>>>> >>> >> On ELF or COFF a bar.c referencing type Foo from the module >>>>>> >>> >> FooLib looks like this: >>>>>> >>> >> >>>>>> >>> >> bar.o >>>>>> >>> >> ~~~~~ >>>>>> >>> >> >>>>>> >>> >> // To keep this example focussed/readable, I'm assuming that >>>>>> >>> >> bar.o itself was not compiled with fission. >>>>>> >>> >> .debug_info: >>>>>> >>> >> DW_TAG_compile_unit >>>>>> >>> >> DW_AT_name(“bar.c”) >>>>>> >>> >> ... >>>>>> >>> >> >>>>>> >>> >> DW_TAG_imported_module // <- This could be optional on ELF. >>>>>> >>> >> DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234) >>>>>> >>> >> >>>>>> >>> >> DW_TAG_variable >>>>>> >>> >> DW_AT_name(“MyFoo”) >>>>>> >>> >> DW_AT_type [DW_FORM_ref4] 0x20 >>>>>> >>> >> 0x20: >>>>>> >>> >> DW_TAG_structure_type >>>>>> >>> >> DW_AT_declaration (true) >>>>>> >>> >> DW_AT_signature [DW_FORM_ref_sig8] (0xF00) >>>>>> >>> >> >>>>>> >>> >> >>>>>> >>> >> // Split DWARF skeleton CU for the module Foo. >>>>>> >>> >> DW_TAG_compile_unit >>>>>> >>> >> >>>>>> >>> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>>>>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) >>>>>> >>> >> ... >>>>>> >>> >> >>>>>> >>> >> // Comdat’d partial unit containing the optional module >>>>>> >>> >> descriptor. >>>>>> >>> >> .debug_info, group 0xABCD1234, comdat >>>>>> >>> >> DW_TAG_partial_unit >>>>>> >>> >> DW_TAG_module >>>>>> >>> >> DW_AT_name(“FooLib”) >>>>>> >>> >> DW_AT_LLVM_sysroot(“/“) >>>>>> >>> >> DW_AT_LLVM_include_dirs(“-I/path”) >>>>>> >>> >> DW_AT_LLVM_macros(“-DNDEBUG”) >>>>>> >>> >> ... >>>>>> >>> >> >>>>>> >>> >> FooLib-XYZ.pcm >>>>>> >>> >> ~~~~~~~~~~~~~~ >>>>>> >>> >> >>>>>> >>> >> .debug_info.dwo >>>>>> >>> >> DW_TAG_compile_unit >>>>>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) >>>>>> >>> >> ... >>>>>> >>> >> >>>>>> >>> >> // Type unit for the type Foo. >>>>>> >>> >> .debug_types.dwo, group 0xF00, comdat >>>>>> >>> >> DW_TAG_type_unit >>>>>> >>> >> DW_TAG_structure_type >>>>>> >>> >> DW_AT_name (“Foo”) >>>>>> >>> >> ... >>>>>> >>> >> >>>>>> >>> >> >>>>>> >>> >> I think it awkward to have both the skeleton compile_unit in >>>>>> >>> >> .debug_info and the partial_unit containing the TAG_module. >>>>>> >>> >> Personally I’d prefer putting the TAG_module into the skeleton CU >>>>>> >>> >> and then just refer to it via a FORM_ref_addr; but if we want to >>>>>> >>> >> put the TAG_module into a comdat section, it looks like that’s >>>>>> >>> >> what’s necessary. >>>>>> >>> > >>>>>> >>> > It's been a while & I've probably lost all the context, but I >>>>>> >>> > think my original theory was to have the skeleton compile_unit be >>>>>> >>> > comdat'd so they'd deduplicate on linking (so we'd only have one >>>>>> >>> > reference to the module.dwo in the linked binary). I don't recall >>>>>> >>> > there being a need for a separate partial_unit - I imagine we'd >>>>>> >>> > just put the LLDB/LLVM extension attributes on the skeleton >>>>>> >>> > compile_unit and expect debuggers that didn't understand them, to >>>>>> >>> > ignore them. >>>>>> >>> > >>>>>> >>> > Was there some reason this didn't work/make sense? Because you >>>>>> >>> > need a DW_TAG_module to import with DW_TAG_imported_module? >>>>>> >>> Using DW_TAG_module was the best practice that was recommended on >>>>>> >>> dwarf-discuss. >>>>>> >>> >>>>>> >>> Did they have any ideas on how to reference it without duplicating >>>>>> >>> it in every CU? >>>>>> >> >>>>>> >> We didn’t touch the deduplication issue. >>>>>> >> >>>>>> >>> Once we've got the "Bag O Dwarf" stuff (rather than the narrower >>>>>> >>> type units) this would be easier - (I suppose we could do a partial >>>>>> >>> solution/abuse of type units - use a type unit header (perhaps with >>>>>> >>> Eric's merged type/compile unit work) and a DW_FORM_ref_sig8 value >>>>>> >>> for the DW_AT_module in the DW_TAG_imported_module. >>>>>> >>> >>>>>> >>> Though I suppose if we're going to have DW_TAG_imported_module in >>>>>> >>> every CU that references a module, it might not be that big of a >>>>>> >>> deal to include the DW_TAG_module itself there too... while I don't >>>>>> >>> care about this scheme immediately, Google's growing LLDB investment >>>>>> >>> in various platforms, so I am vaguely concerned about getting this >>>>>> >>> right & it's not immediately obvious to me what that right answer is. >>>>>> >> >>>>>> >> Maybe the best path forward is to stage this by initially putting the >>>>>> >> DW_TAG_module into the main CU and leave the deduplication as an >>>>>> >> optimization to be implemented once the bag’o dwarf is more fleshed >>>>>> >> out. This way we won’t do anything that would confuse consumers >>>>>> >> (assuming they ignore unknown tags) and the extra overhead is likely >>>>>> >> not even going to be noticeable, since all the string attributes >>>>>> >> inside the TAG_module can already be deduplicated by traditional >>>>>> >> means. >>>>>> > >>>>>> > Perhaps. I'd still like to think through/document what this looks like >>>>>> > a bit more. Where the data ends up, what it's used for, etc. Sorry to >>>>>> > draw this out. >>>>>> > >>>>>> > :/ *ponders* >>>>>> >>>>>> >>>>>> Let’s construct this: >>>>>> >>>>>> The most straightforward representation is to not unique the TAG_module >>>>>> and place it into the main CU. >>>>>> >>>>>> bar.o >>>>>> ~~~~~ >>>>>> >>>>>> .debug_info: >>>>>> DW_TAG_compile_unit >>>>>> ... >>>>>> DW_TAG_imported_module >>>>>> DW_AT_import [DW_FORM_ref4] (0x20) >>>>>> 0x20: >>>>>> DW_TAG_module >>>>>> DW_AT_name(“FooLib”) >>>>>> DW_AT_LLVM_sysroot(“/“) >>>>>> DW_AT_LLVM_include_dirs(“-I/path”) >>>>>> DW_AT_LLVM_macros(“-DNDEBUG”) >>>>>> >>>>>> Might as well put all these LLVM attributes on the skeleton CU, though - >>>>>> so they can be deduplicated (& just put the dwo_id in this module >>>>>> somewhere, perhaps just using the DW_AT_dwo_id attribute - possibly >>>>>> that's the only attribute the DW_TAG_module would need, ideally). Unless >>>>>> we need to consider the submodule issue (in which case the skeleton unit >>>>>> would reference the whole module but the submodules would >>>>>> reference/describe the respective submodules?)? >>>>> >>>>> We cannot put them into the skeleton CU if the skeleton CU is going to be >>>>> comdat’d, because we’d then have to refer to it via a signature and that >>>>> leads us directly to the can of worms discussed in the next paragraph :-) >>>>>> >>>>>> ... >>>>>> >>>>>> // Split DWARF skeleton, comdat'd. >>>>>> .debug_info, group 0xFEDB9876, comdat >>>>>> DW_TAG_compile_unit >>>>>> >>>>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>>>>> DW_AT_dwo_id(“0xFEDB9876”) >>>>>> ... >>>>>> >>>>>> On Mach-O the split DWARF skeleton would not be a comdat’d, but >>>>>> llvm-dsymutil can just ignore it. >>>>>> >>>>>> >>>>>> If we want to dedup the TAG_module we need to refer to it via signature. >>>>>> This means we need to wrap it in a type_unit or a DWARF5 TAG_type_unit. >>>>>> We might as well throw it in with the skeleton CU. >>>>>> >>>>>> .debug_info: >>>>>> DW_TAG_compile_unit >>>>>> ... >>>>>> DW_TAG_imported_module >>>>>> DW_AT_import [DW_FORM_ref_sig8] (0xABCD1234) >>>>>> >>>>>> // Split DWARF skeleton, comdat'd. >>>>>> .debug_info, group 0xFEDB9876, comdat >>>>>> DW_TAG_compile_unit >>>>>> >>>>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>>>>> DW_AT_dwo_id(“0xFEDB9876”) >>>>>> ... >>>>>> DW_TAG_type_unit (signature: 0xABCD1234) >>>>>> >>>>>> Can't really put a type_unit inside a compile_unit - it'd need to be >>>>>> top-level with an appropriate type unit header, etc. & then we'd need >>>>>> two different units/headers, could still comdat them, but it's a weird >>>>>> abuse of type units & would probably confuse consumers. I don't know >>>>>> whether that's worth the effort. >>>>> Oh right. >>>>> >>>>>> >>>>>> DW_TAG_module >>>>>> DW_AT_name(“FooLib”) >>>>>> DW_AT_LLVM_sysroot(“/“) >>>>>> DW_AT_LLVM_include_dirs(“-I/path”) >>>>>> DW_AT_LLVM_macros(“-DNDEBUG”) >>>>>> ... >>>>>> >>>>>> Now that raises the question about what happens with multiple modules >>>>>> within one PCM. >>>>>> >>>>>> Is the right term "submodule"? it's sort of confusing to talk about >>>>>> multiple modules within a pcm. >>>>> >>>>> Yes, a module with nested submodules. >>>>> http://clang.llvm.org/docs/Modules.html#submodule-declaration >>>>> >>>>>> >>>>>> Assuming that the ELF linker is linking and deduping all the non-.dwo >>>>>> sections, we may loose some of the TAG_modules (if not every CU imports >>>>>> all submodules) in the binary, but that wouldn’t matter because the >>>>>> consumer would find all TAG_modules by signature in the .pcm >>>>>> >>>>>> Is there any reason we need to reference the submodules individually, >>>>>> rather than just reference the whole module >>>>> >>>>> My assumption is that an AST-aware debugger will want to import the exact >>>>> submodules that were imported by the CU before dropping into the >>>>> expression evaluator to replicate the environment of the CU as much as >>>>> possible. >>>>> >>>>> I'm just not picturing that. It seems pretty likely that a debugger user >>>>> is more likely to treat the whole set of names in the program, not just >>>>> those syntactically valid at that point in the source file. >>>> >>>> Module imports only work if the debugger has the precise list of models >>>> imported by the current CU. Clang modules are not namespaces, and any two >>>> modules may conflict. >>> >>> Right, as you say - ODR & C languages. (& I've no idea if file-scoped >>> static/anonymous namespace things can go in C++ modules and what happens if >>> you have conflicting modules in that regard - I guess they can conflict >>> too? Dunno - maybe anon namespaces in C++ modules aren't allowed) >> >> It sounds like a strange concept to put an anonymous namespace into a public >> module, but then again there exists clang/test/Modules/anon-namespace.cpp >> (it only uses an empty anonymous namespace, though). I’m not sure how this >> is meant to be used. >> >>>> >>>> The cool thing is that with the imported modules the debugger effectively >>>> becomes clang and have the entire world visible to the current CU >>>> available, including any types and functions that never made it into the >>>> debug info because they were optimized out, or because there were >>>> uninstantiated templates that cannot be represented by DWARF. >>>> >>>>> A simple example would be if I'm debugging LLVM and I'm in some generic >>>>> optimization pass, but I want to cast my Instruction pointer to some >>>>> specific instruction type to examine it in more detail - even though this >>>>> pass doesn't care about that specific Instruction type nor include the >>>>> header in which it's declared. >>>> >>>> If, however, the type lookup fails, the debugger can still fall back to >>>> the traditional behavior, find the type in the accelerator tables and >>>> reconstruct it from DWARF (if it is there). >>> >>> So you're going to need to implement fission (to at least some degree) >>> support in LLDB, then? (to support the case where you haven't linked debug >>> info with llvm-dsymutil, but you've hit one of these lookup problems where >>> you need to cross possibly-conflicting modules) >> >> Yes. Specifically, it won’t support type units, and it will look up types by >> name rather than by signature. (cf. the second part of >> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html) > > How are you going to reference the types in the module's fission CU without > type units/signatures? Are you going to emit type declarations into the > normal CU and rely on the debugger to know that these declarations can be > resolved by looking elsewhere? (just without the benefit of constraining that > search to just looking for a matching TU?)
If you look at the example in http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150427/128278.html, there will be an external type index (using the usual accelerator table format) that maps an external type’s UID to a pcm. In the pcm there is an extra accelerator table entry that maps UID to DIE offset. >> >> >>> >>> OK, so I think it's probably reasonable for now to just add DW_TAG_modules >>> to the CU for each referenced module (or does it have to be each referenced >>> submodule? (can two submodules within a single module be >>> contradictory/conflicting?)). Since we don't have any good way to reference >>> the module is a foreign unit while deduplicating that unit... there's not >>> much point having the imported_module - but if you think it adds anything, >>> I'm open to ideas. >> It could help keeping things simpler. >> Emitting it doesn’t add much semantic value because module imports always >> occur at the top level, but it will make the transition to the deduplicated >> TAG_modules easier — It could be easier to teach consumers once about >> imported_module({ref to TAG_module}) rather than having them also recognize >> top-level TAG_modules as an intermediate step. It’s also slightly easier to >> implement in LLVM because the imported_module allows us to anchor the >> TAG_module in the CU, but that’s not a very strong argument. > > Agreed on all counts (not a strong argument, but convenient enough, etc, etc). > > I'm still not entirely sure what the right answer is here, though, which is > why I'm hesitant to bake anything in too strongly. > > To come back to one of the outstanding questions: Do you need submodule > import information, or just module level (if modules cannot have internal > conflicts and you can't avoid cross-module conflicts just by lack of > visibility (I have no idea if either of those things are true) then you may > just need per-module not per-submodule info)? At the moment I do not think that it makes sense for two submodules to conflict, but there is nothing in the clang documentation that explicitly forbids this. With this in mind, I think it is reasonable to not support submodules (at least initially) and always emit an import for the parent module. Thats what I wanted to write ... but I as I’m browsing through our documentation, http://clang.llvm.org/docs/Modules.html#conflict-declarations explicitly gives an example of two conflicting submodules, so maybe this is not a reasonable simplification after all. On the other hand, a quick grep over all system module maps on OS X doesn’t show a single conflict declaration. I still believe we do not need to support submodules right from the start, but we should have a story for getting there if we need to. > > Also, does each submodule need different special attributes/flags? If the > special codegen attributes you want are at the module level, it'd probably be > best to keep those on the Skeleton CU for the module (that will be comdat > folded, etc, on ELF - and they could be DWARF-aware deduplicated by > llvm-dsymutil) so they're not duplicated. The DW_TAG_module would then just > have a DW_AT_signature attribute or something similarly small/trivial to > point to the skeleton CU. The attributes are derived from cc1 command line arguments. Not two submodules imported by one CU can have different attributes. All submodules in a pcm also share their attributes. Putting them into the skeleton CU appears to be the most efficient place to put them, though perhaps not the most logical one. I would prefer to stick the attributes on the (top-level) DW_TAG_module and later deduplicate the attributes together with the DW_TAG_module. Sticking them on the skeleton won’t save any space in the .o files and would save 3*4-8=4 bytes (3x FORM_strp for include, macro, and isysroot - 1x FORM_ref_sig_8) per CU and imported module. > > If you need submodule import lists, then each DW_AT_module representing a > submodule would have a name (anything else?) and the signature refering to > its module skeleton CU. What I’m envisioning is .debug_info: DW_TAG_compile_unit ... DW_TAG_imported_module // import FooSubA DW_AT_import [DW_FORM_ref4] (0x60) DW_TAG_module DW_AT_name(“FooLib”) DW_AT_LLVM_sysroot(“/“) DW_AT_LLVM_include_dirs(“-I/path”) DW_AT_LLVM_macros(“-DNDEBUG”) 0x60: DW_TAG_module DW_AT_name(“FooSubA”) // need not be emitted if not referenced. DW_TAG_module DW_AT_name(“FooSubASubA”) // need not be emitted if not referenced. DW_TAG_module DW_AT_name(“FooSubB”) -- adrian > >> >>> Maybe later (when we have Bag O' DWARF) we can do that. & only do this when >>> targeting lldb (on by default on Darwin, off by default elsewhere). >>> >>> & LLDB, once it's got the Fission support it'll need for this anyway, will >>> fallback gracefully if these special modules are omitted. >> >> Sounds good to me! >> >> -- adrian >> >>> >>> - David >>> >>> >>>> >>>>> (& have just a single, whole module in the pcm)? >>>> >>>> That’s probably not what you meant, but just to be sure: The pcm will >>>> always have the entire module with all submodules in it. But the debugger >>>> may choose to import only a subset of those. >>>> >>>>> >>>>> file referred to by whichever skeleton CU makes it into the binary: >>>>> >>>>> FooLib-XYZ.pcm >>>>> ~~~~~~~~~~~~~~ >>>>> >>>>> .debug_info.dwo >>>>> DW_TAG_compile_unit >>>>> DW_AT_dwo_id(“0xFEDB9876”) >>>>> ... >>>>> >>>>> DW_TAG_type_unit (signature: 0xABCD1234) >>>>> DW_TAG_module >>>>> DW_AT_name(“FooLib”) >>>>> ... >>>>> DW_TAG_type_unit (signature: 0xCDEF3456) >>>>> DW_TAG_module >>>>> DW_AT_name(“FooLib”) >>>>> DW_TAG_module >>>>> DW_AT_name(“SubFoo”) >>>>> ... >>>>> >>>>> So.. this should work as long as nobody points out that a module isn’t >>>>> really a type. >>>>> >>>>> Yeah, probably worth waiting for "Bag O DWARF". >>>>> >>>>> For now, as you mentioned earlier, maybe just putting the imported_module >>>>> and the module into the compile_unit when tuning for LLDB (so Darwin by >>>>> default, and anywhere else where someone tunes for LLDB in the future) & >>>>> leave them out otherwise. >>>> >>>> Sounds prefectly reasonable. >>>>> >>>>> Could you remind me why LLDB wants to know which modules are referenced >>>>> from a CU? (rather than just all the modules used by a program overall?) >>>> >>>> LLDB uses clang for the expression evaluation. Traditionally it would look >>>> up a type in DWARF, build a clang AST out of it and then import it. With >>>> this it could directly import the clang modules and have access to >>>> everything in the module. But, clang modules are not namespaces, so >>>> modules can conflict (and that would probably manifest as a crash in >>>> libclang). >>>> >>>> What's an example of such a conflict? Is that valid (or is it just in ODR >>>> violations) - as mentioned above, it seems to me that only importing the >>>> things lexically available in this source file isn't what a debugger user >>>> would really want. I certainly think I'd trip over that a lot. >>> >>> Keep in mind that Objective-C (and C) do not have an ODR, so it’s not just >>> “just” :-) >>> Being able to import modules does not mean that the debugger cannot still >>> fall back to loading types from DWARF; in fact it will have to do that for >>> all local types anyway. >>> >>> -- adrian >>> >>>> >>>> It therefore needs to know which modules are imported in the current CU >>>> before dropping into the expression evaluator. >>>> >>>> - adrian >>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Macho-O, in the absence of comdats, we have: >>>>> >>>>> bar.o >>>>> ~~~~~ >>>>> >>>>> .debug_info: >>>>> DW_TAG_compile_unit >>>>> ... >>>>> DW_TAG_imported_module >>>>> DW_AT_import [DW_FORM_ref4] (0x20) >>>>> >>>>> DW_TAG_module // uniqued by dsymutil. >>>>> DW_AT_name(“FooLib”) >>>>> DW_AT_LLVM_sysroot(“/“) >>>>> DW_AT_LLVM_include_dirs(“-I/path”) >>>>> DW_AT_LLVM_macros(“-DNDEBUG”) >>>>> ... >>>>> >>>>> // Split DWARF skeleton, thrown out by dsymutil. >>>>> >>>>> Thrown out? Because it's going to read everything in from the module and >>>>> merge it in to a single linked debug info blob, I take it? >>>>> >>>>> .debug_info, group 0xFEDB9876, comdat >>>>> DW_TAG_compile_unit >>>>> >>>>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>>>> DW_AT_dwo_id(“0xFEDB9876”) >>>>> ... >>>>> >>>>> FooLib-XYZ.pcm >>>>> ~~~~~~~~~~~~~~ >>>>> >>>>> .debug_info: >>>>> DW_TAG_compile_unit >>>>> DW_AT_dwo_id(“0xFEDB9876”) >>>>> ... >>>>> >>>>> DW_TAG_module >>>>> DW_AT_name(“FooLib”) >>>>> DW_TAG_module >>>>> DW_AT_name(“SubFoo”) >>>>> ... >>>>> >>>>> -- adrian >>>>> >>>>> > >>>>> >> >>>>> >>> >>>>> >>> > If it turns out that's the right way to get a target for the >>>>> >>> > imported_module, we could put both the skeleton CU and the partial >>>>> >>> > unit in the same comdat and dedup them both together. >>>>> >>> >>>>> >>> I think this works as long as we only have one TAG_module per .pcm >>>>> >>> file (because we need to refer to it via signature). >>>>> >>> >>>>> >>> Not quite following here - why would we have more than one module per >>>>> >>> pcm - a pcm is a module, right? >>>>> >> >>>>> >> Clang modules may have submodules and a compile unit could import two >>>>> >> submodules that live in the same .pcm file. For example on Darwin >>>>> >> there is a module Darwin.pcm that contains a submodule “C" that >>>>> >> contains the submodule “stdio". >>>>> > >>>>> > OK, so this bit's relevant to your use case in LLDB of loading the >>>>> > right things for the right context, but not relevant to the >>>>> > context-less debuggers like GDB that will just treat everything as one >>>>> > big namespace (except for file-local things, etc). So it's important >>>>> > for your imported modules but not for the basic Fission style debug >>>>> > reference. >>>>> > >>>>> > Well, maybe - I'm not sure what you're picturing in terms of the DWARF >>>>> > in the module for submodules? If you want that granularity we'll have >>>>> > to talk about how to split the DWARF in the module into chunks per >>>>> > submodule? >>>>> > >>>>> >> >>>>> >>> >>>>> >>> But if we don’t mind having duplicate dwo_* references in the same .o >>>>> >>> file this would also work with more than one TAG_module (or >>>>> >>> submodules). >>>>> >>> >>>>> >>> >>>>> >>> .debug_info: >>>>> >>> DW_TAG_compile_unit >>>>> >>> DW_AT_name(“bar.c”) >>>>> >>> ... >>>>> >>> >>>>> >>> DW_TAG_imported_module // <- This could be optional on ELF. >>>>> >>> DW_AT_import [DW_FORM_ref_sig8] (0xFEDB9876) >>>>> >>> >>>>> >>> ... >>>>> >>> >>>>> >>> // Comdat’d split DWARF skeleton CU for the module Foo. >>>>> >>> .debug_info, group 0xFEDB9876, comdat >>>>> >>> DW_TAG_compile_unit >>>>> >>> >>>>> >>> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>>>> >>> DW_AT_dwo_id(“0xFEDB9876”) >>>>> >>> ... >>>>> >>> >>>>> >>> DW_TAG_module >>>>> >>> DW_AT_name(“FooLib”) >>>>> >>> DW_AT_LLVM_sysroot(“/“) >>>>> >>> DW_AT_LLVM_include_dirs(“-I/path”) >>>>> >>> DW_AT_LLVM_macros(“-DNDEBUG”) >>>>> >>> ... >>>>> >>> >>>>> >>> >>>>> >>> > >>>>> >>> > But this gets into complicated territory when the original binary >>>>> >>> > is built with fission... which will be relevant for modules on ELF >>>>> >>> > with LLDB. Hmm, maybe it's not too complicated - the partial_unit >>>>> >>> > would end up in the .dwo file (maybe we'd have to teach the .dwo >>>>> >>> > file to deduplicate these too - the same way it does for type >>>>> >>> > units... - might require a new header to include the hash, etc >>>>> >>> > :/)... would be tricky to have the dwp tool resolve the relocations >>>>> >>> > to these things. Cross-unit references as you've got there aren't >>>>> >>> > something that every DWARF consumer is totally cool with, I don't >>>>> >>> > think? >>>>> >>> >>>>> >>> Ah. I thought the deduplication happens because all ELF sections >>>>> >>> sharing the same group are uniqued based on the group id. >>>>> >>> >>>>> >>> COMDAT groups deduplicate for a normal non-fission build, but fission >>>>> >>> linking doesn't require the .dwo file to use/contain COMDATs as it >>>>> >>> uses a DWARF-aware tool (so you don't bother putting the type units >>>>> >>> in COMDAT groups, for example - the fission linker knows how to parse >>>>> >>> debug_types, find the type unit headers and their hashes and >>>>> >>> deduplicates them that way). >>>>> >> >>>>> >> Ok that makes sense. >>>>> >> >>>>> >> -- adrian >>>>> >> >>>>> >>> >>>>> >>> It certainly would be nice if we could avoid introducing a new >>>>> >>> .debug_info header... >>>>> >>> >>>>> >>> > >>>>> >>> > Sort of inclined to have the imported module stuff just for LLDB, >>>>> >>> > but I've lost some of the context for that in the ensuing weeks. >>>>> >>> >>>>> >>> -- adrian >>>>> >>> >>>>> >>> > >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> MachO (no typeunits, no comdats, with imports) >>>>> >>> >> ---------------------------------------------- >>>>> >>> >> >>>>> >>> >> Since we don’t have comdat sections in Mach-O and we don’t have >>>>> >>> >> the tool support for type units, the way that external types can >>>>> >>> >> be referenced necessarily needs to be a bit different. The design >>>>> >>> >> that Greg and I came up with for Mach-O relies on llvm-dsymutil to >>>>> >>> >> fix up the DWARF for non-module-aware consumers. Just as ELF DWARF >>>>> >>> >> consumers need not be able to tell the difference between module >>>>> >>> >> debugging an split DWARF, on Mach-O the .dSYM bundle generated by >>>>> >>> >> llvm-dsymutil looks like traditional DWARF. >>>>> >>> >> >>>>> >>> >> There are three differences in the DWARF output that make this >>>>> >>> >> possible: >>>>> >>> >> - Refer to external types by UID rather than by type signature. >>>>> >>> >> (This doubles as the key that allows a debugger to look import >>>>> >>> >> the type >>>>> >>> >> directly from the AST and protects us against hash collisions) >>>>> >>> >> - Add an index to the .o file that maps UID -> module file. >>>>> >>> >> (Fast lookup + UIDs for C and ObjC are only unique within a >>>>> >>> >> module) >>>>> >>> >> - Add an entry for each type’s UID to the types accelerator >>>>> >>> >> table. >>>>> >>> >> (Fast lookup) >>>>> >>> >> >>>>> >>> >> bar.o >>>>> >>> >> ~~~~~ >>>>> >>> >> >>>>> >>> >> .debug_info: >>>>> >>> >> DW_TAG_compile_unit >>>>> >>> >> DW_AT_name(“bar.c”) >>>>> >>> >> DW_TAG_imported_module >>>>> >>> >> DW_AT_import(DW_FORM_ref_addr 0x40) >>>>> >>> >> >>>>> >>> >> DW_TAG_variable >>>>> >>> >> DW_AT_name(“MyFoo”) >>>>> >>> >> DW_AT_type [DW_FORM_strp] (“_ZTS3Foo”) // We could use a >>>>> >>> >> custom FORM here >>>>> >>> >> >>>>> >>> >> // Skeleton unit. >>>>> >>> >> DW_TAG_compile_unit >>>>> >>> >> >>>>> >>> >> DW_AT_dwo_name(“/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm”) >>>>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) >>>>> >>> >> ... >>>>> >>> >> 0x40: >>>>> >>> >> DW_TAG_module >>>>> >>> >> DW_AT_name(“FooLib”) >>>>> >>> >> DW_AT_LLVM_sysroot(“/“) >>>>> >>> >> DW_AT_LLVM_include_dirs(“-I/path”) >>>>> >>> >> DW_AT_LLVM_macros(“-DNDEBUG”) >>>>> >>> >> >>>>> >>> >> // This index uses the usual accelerator table format. >>>>> >>> >> .apple_exttypes: >>>>> >>> >> { “_ZTS3Foo” => debug_str offset of >>>>> >>> >> ”/tmp/org.llvm.clang/ModuleCache/1234ABCDE/FooLib-XYZ.pcm” } >>>>> >>> >> >>>>> >>> >> FooLib-XYZ.pcm >>>>> >>> >> ~~~~~~~~~~~~~~ >>>>> >>> >> >>>>> >>> >> .debug_info >>>>> >>> >> DW_TAG_compile_unit >>>>> >>> >> DW_AT_dwo_id(“0xFEDB9876”) >>>>> >>> >> >>>>> >>> >> 0x80: >>>>> >>> >> DW_TAG_structure_type >>>>> >>> >> DW_AT_name (“Foo”) >>>>> >>> >> DW_AT_signature >>>>> >>> >> ... >>>>> >>> >> >>>>> >>> >> // In addition to the entry for “Foo”, there is also an entry for >>>>> >>> >> the type’s UID “_ZTS3Foo” pointing to the type definition DIE. >>>>> >>> >> .apple_types >>>>> >>> >> { “Foo” => 0x80 } >>>>> >>> >> { “_ZTS3Foo” => 0x80 } >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> When the debug info linker (llvm-dsymutil) is run, it first pulls >>>>> >>> >> in the .debug_info section from the clang module and fixes up all >>>>> >>> >> the DW_FORM_strp external type references by turning them into a >>>>> >>> >> DW_FORM_ref_addr that references the type in the >>>>> >>> >> DW_TAG_compile_unit pulled in from the module. To find the correct >>>>> >>> >> type DIE it looks up the UID in the .apple_exttypes index, finds >>>>> >>> >> the module, looks up the UID in the regular .apple_types >>>>> >>> >> accelerator table and replaces the temporary DW_FROM_strp with a >>>>> >>> >> DW_FORM_ref_addr (which incidentally takes up the same amount of >>>>> >>> >> space in the DIE). >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> Thoughts? >>>>> >>> >> -- >>>>> >>> >> adrian >>>>> >>> >> >>>>> >>> > >>>>> >>> >>>>> >> >>>>> >> >>>>> > >>> >>> >> >> > _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
